"Segmentation fault" while execute dynamiclly malloc code

"Segmentation fault" while execute dynamiclly malloc code - c

I write a sample code on x86_64,try to execute dynamiclly malloc code.
there is a
Program received signal SIGSEGV, Segmentation fault.
0x0000000000601010 in ?? ()
0x0000000000601010 is the position of bin,someone can tell why? thanks!!
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include <sys/mman.h>
volatile int sum(int a,int b)
{
return a+b;
}
int main(int argc, char **argv)
{
char* bin = NULL;
unsigned int len = 0;
int ret = 0;
/*code_str is the compiled code for function sum.*/
char code_str[] ={0x55,0x48,0x89,0xe5,0x89,0x7d,0xfc,0x89,
0x75,0xf8,0x8b,0x45,0xf8,0x03,0x45,0xfc,0xc9,0xc3};
len = sizeof(code_str)/sizeof(char);
bin = (char*)malloc(len);
memcpy(bin,code_str,len);
mprotect(bin,len , PROT_EXEC | PROT_READ | PROT_WRITE);
asm volatile ("mov $0x2,%%esi \n\t"
"mov $0x8,%%edi \n\t"
"mov %1,%%rbx \n\t"
"call *%%rbx "
:"=a"(ret)
:"g"(bin)
:"%rbx","%esi","%edi");
printf("sum:%d\n",ret);
return 0;
}

Never do such tricks without checking the return of system functions. My man page for mprotect says in particular:
POSIX says that the behavior of mprotect() is unspecified if it
is applied to a region of memory that was not obtained via mmap(2).
so don't do that with malloced buffers.
Also:
The buffer size is just sizeof(code_str), there is no reason to divide by sizeof(char) (which is guaranteed to be 1, but that doesn't make it correct).
There's no need to cast the return of malloc (nor mmap if you change it to that).
The correct type for code_str is unsigned char and not char.

the question is that bin address should align to multiple PAGESIZE,or mprotect will return -1,arguments invalid.
bin = (char *)(((int) bin + PAGESIZE-1) & ~(PAGESIZE-1));//added....
memcpy(bin,code_str,len);
if(mprotect(bin, len , PROT_EXEC |PROT_READ | PROT_WRITE) == -1)
{
printf("mprotect error:%d\n",errno);
return 0;
}

Related

How to change the value of a variable without the compiler knowing?

I want to verify the role of volatile by this method. But my inline assembly code doesn't seem to be able to modify the value of i without the compiler knowing. According to the articles I read, I only need to write assembly code like __asm { mov dword ptr [ebp-4], 20h }, I think I write the same as what he did.
actual output:
before = 10
after = 123
expected output:
before = 10
after = 10
Article link: https://www.runoob.com/w3cnote/c-volatile-keyword.html
#include <stdio.h>
int main() {
int a, b;
// volatile int i = 10;
int i = 10;
a = i;
printf("before = %d\n", a);
// Change the value of i in memory without letting the compiler know.
// I can't run the following statement here, so I wrote one myself
// mov dword ptr [ebp-4], 20h
asm("movl $123, -12(%rbp)");
b = i;
printf("after = %d\n", b);
}

I want to verify the role of volatile ...
You can't.
If a variable is not volatile, the compiler may optimize; it does not need to do this.
A compiler may always treat any variable as volatile.
How to change the value of a variable without the compiler knowing?
Create a second thread writing to the variable.
Example
The following example is for Linux (under Windows, you need a different function than pthread_create()):
#include <stdio.h>
#include <pthread.h>
int testVar;
volatile int waitVar;
void * otherThread(void * dummy)
{
while(waitVar != 2) { /* Wait */ }
testVar = 123;
waitVar = 3;
return NULL;
}
int main()
{
pthread_t pt;
waitVar = 1;
pthread_create(&pt, 0, otherThread, NULL);
testVar = 10;
waitVar = 2;
while(waitVar != 3) { /* Wait */ }
printf("%d\n", testVar - 10);
return 0;
}
If you compile with gcc -O0 -o x x.c -lpthread, the compiler does not optimize and works like all variables are volatile. printf() prints 113.
If you compile with -O3 instead of -O0, printf() prints 0.
If you replace int testVar by volatile int testVar, printf() always prints 113 (independent of -O0/-O3).
(Tested with the GCC 9.4.0 compiler.)

counting lines of input using memchr fails

I wrote a program to count lines of input given by stdin :
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#define BUFF_SIZE 8192
#define RS '\n'
int
main(int argc, char **argv)
{
char buff[BUFF_SIZE];
ssize_t n;
char *r;
int c = 0;
readchunk:
n = read(0, buff, BUFF_SIZE);
if (n<=0) goto end; // EOF
r=buff;
searchrs:
r = memchr(r, RS, n);
if(r!=NULL) {
c++;
if((r-buff)<n) {
++r;
goto searchrs;
}
}
goto readchunk;
end:
printf("%d\n", ++c);
return 0;
}
I compiled it with gcc, with no options.
When run, it gives unstable result, not far from truth but false. Sometimes it segfaults. The bigger is the buffer size the more often it segfaults.
What am I doing wrong ?

Building your program with -fsanitize=address and feeding it sufficiently long input produces:
==119818==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffedbba1500 at pc 0x7fc4d56fd574 bp 0x7ffedbb9f4a0 sp 0x7ffedbb9ec50
READ of size 8192 at 0x7ffedbba1500 thread T0
#0 0x7fc4d56fd573 (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x40573)
#1 0x563fdf5f4b90 in main /tmp/t.c:23
#2 0x7fc4d533e2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
#3 0x563fdf5f49c9 in _start (/tmp/a.out+0x9c9)
Address 0x7ffedbba1500 is located in stack of thread T0 at offset 8224 in frame
#0 0x563fdf5f4ab9 in main /tmp/t.c:11
This frame has 1 object(s):
[32, 8224) 'buff' <== Memory access at offset 8224 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x40573)
Line 23 is the call to memchr.
When you increment r, you should probably decrement n.

Getting function location in executable [duplicate]

This question already has answers here:
How to get the length of a function in bytes?
(13 answers)
Closed 7 years ago.
I need the location of a code section in the executable (begin and ebn address). I tried to use two dummy functions:
void begin_address(){}
void f(){
...
}
void end_address(){}
...
printf("Function length: %td\n", (intptr_t)end_address - (intptr_t)begin_address);
The problem is, that using -O4 optimization with gcc I got a negative length. It seems that this does not work with optimizations.
I compiled f to assembly, and tried the following:
__asm__(
"func_begin:"
"movq $10, %rax;"
"movq $20, %rbx;"
"addq %rbx, %rax;"
"func_end:"
);
extern unsigned char* func_begin;
extern unsigned char* func_end;
int main(){
printf("Function begin and end address: %p\t%p\n", func_begin, func_end);
printf("Function length: %td\n", (intptr_t)func_end - (intptr_t)func_begin);
}
The problem is that even without optimization I am getting some strange output:
Function begin and end address: 0x480000000ac0c748 0xf5158b48e5894855
Function length: -5974716185612615411
How can I get the location of a function in the executable? My second question is whether referring to this address as const char* is safe or not. I am interested in both 32 and 64 bit solutions if there is a difference.

If you want to see how many bytes a function occupy in a binary, you can use objdump to disassemble the binary to see the first ip and last ip of a function. Or you can print $ebp - $esp if you want to know how many space a function use on stack.

If a viable option for you, tell gcc to compile the needed parts with -O0 instead:
#include <stdio.h>
#include <stdint.h>
void __attribute__((optimize("O0"))) func_begin(){}
void __attribute__((optimize("O0"))) f(){
return;
}
void __attribute__((optimize("O0"))) func_end(){}
int main()
{
printf("Function begin and end address: %p\t%p\n", func_begin, func_end);
printf("Function length: %td\n", (uintptr_t)func_end - (uintptr_t)func_begin);
}
I'm not sure whether __attribute__((optimize("O0"))) is needed for f().

I don't know about GCC, but in the case of some Microsoft compilers, or some versions of Visual Studio, if you build in debug mode, it creates a jump table for function entries that then jump to the actual function. In release mode, it normally doesn't use the jump table.
I thought most linker's have a map output option what would at least show the offsets to functions.
You could use an asm instruction that you could search for:
movel $12345678,eax ;search for this instruction
This worked with Microsoft C / C++ 4.1, VS2005, and VS2010 release builds:
#include <stdio.h>
void swap(char **a, char **b){
char *temp = *a;
*a = *b;
*b = temp;
}
void sortLine(char *a[], int size){
int i, j;
for (i = 0; i < size; i++){
for (j = i + 1; j < size; j++){
if(memcmp(a[i], a[j], 80) > 0){
swap(&a[i], &a[j]);
}
}
}
}
int main(int argc, char **argv)
{
void (*pswap)(char **a, char **b) = swap;
void (*psortLine)(char *a[], int size) = sortLine;
char *pfun1 = (void *) pswap;
char *pfun2 = (void *) psortLine;
printf("%p %p %x\n", pfun1, pfun2, pfun2-pfun1);
return(0);
}

Jumping to the data segment

I am testing an assembler I am writing which generates X86 instructions. I would like to do something like this to test whether the instructions work or not.
#include<stdio.h>
unsigned char code[2] = {0xc9, 0xc3};
int main() {
void (*foo)();
foo = &code;
foo();
return 0;
}
However it seems that OS X is preventing this due to DEP. Is there a way to either (a) disable DEP for this program or (b) enter the bytes in another format such that I can jump to them.

If you just need to test, try this instead, it's magic...
const unsigned char code[2] = {0xc9, 0xc3};
^^^^^
The const keyword causes the compiler to place it in the const section (warning! this is an implementation detail!), which is in the same segment as the text section. The entire segment should be executable. It is probably more portable to do it this way:
__attribute__((section("text"))
const unsigned char code[2] = {0xc9, 0xc3};
And you can always do it in an assembly file,
.text
.globl code
code:
.byte 0xc9
.byte 0xc3
However: If you want to change the code at runtime, you need to use mprotect. By default, there are no mappings in memory with both write and execute permissions.
Here is an example:
#include <stdlib.h>
#include <sys/mman.h>
#include <err.h>
#include <stdint.h>
int main(int argc, char *argv[])
{
unsigned char *p = malloc(4);
int r;
// This is x86_64 code
p[0] = 0x8d;
p[1] = 0x47;
p[2] = 0x01;
p[3] = 0xc3;
// This is hackish, and in production you should do better.
// Casting 4095 to uintptr_t is actually necessary on 64-bit.
r = mprotect((void *) ((uintptr_t) p & ~(uintptr_t) 4095), 4096,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (r)
err(1, "mprotect");
// f(x) = x + 1
int (*f)(int) = (int (*)(int)) p;
return f(1);
}
The mprotect specification states that its behavior is undefined if the memory was not originally mapped with mmap, but you're testing, not shipping, so just know that it works just fine on OS X because the OS X malloc uses mmap behind the scenes (exclusively, I think).

Don't know about your DEP on OSX, but another thing you could do would be to malloc() the memory you write the code to and then jump into this malloc'ed area. At least on Linux this memory would not be exec-protected (and in fact that's how a JIT usually does the trick).

Find program's code address at runtime?

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.

Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.

If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}

To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]

About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

"Segmentation fault" while execute dynamiclly malloc code - c

Related

How to change the value of a variable without the compiler knowing?

counting lines of input using memchr fails

Getting function location in executable [duplicate]

Jumping to the data segment

Find program's code address at runtime?

Categories

Resources