Why odd operand error when compiling assembly? - c

Learning assembly and reading about the BIT instruction on msp430.
When trying to compile this code:
int main (void)
{
while(1){
__asm__("BIT R2, 3");
}
return 0;
}
It says: error: odd operand: -3
Yet when writing __asm__("BIT.B R2, 3"); instead, it works.
Could somebody explain this please?

The instruction BIT R2, 3 is using symbolic mode for the destination address (i.e. an offset from the program counter). You must use BIT R2, #3 if you want to use the immediate value 3.
The reason this fails with BIT and not with BIT.B is because BIT does a word operation and you are using an odd address which is illegal. Word operations must be word aligned (i.e. even addresses) in the MSP430. Byte operations can operate on any byte address, odd or even.
You can get quite detailed information if you read the User Guide for the family of MCU you are using. For example, for the MSP430x2xxx family you would read the https://www.ti.com/lit/ug/slau144j/slau144j.pdf document, Chapter 3 or 4 depending on whether your MCU has the newer 20-bit address core.

Related

Cant exploit overflow in simple program (chapter2 shellcoder's handbook)

I am reading The shellcoder's Handbook and im currently at chapter 2 where i have a simple program to exploit by overflowing the expected input and then issuing a new location for the ret instruction so that the function return_input can be executed twice !
Here is the simple program made in C
void return_input (void)
{
char array[30];
gets (array);
printf(“%s\n”, array);
}
main()
{
return_input();
return 0;
}
And this is the disassembled version of the main fucntion where we can see the jump adress of the call function.
I use the following command and input the chars that overflow with the adress following them that should replace ret's content
But as you can see i do not run the return_input function twice instead it just prints out a question mark and says segmentation failed
gets read terminating byte in and replaced it with NULL byte and thus your desired ret was broken with that NULL byte.
The offset you saw in disassembly codes is NOT the real address, you compiled the program with PIE flag set so the real address may look like 0x55555????58a, that's why gdb didn't allow you to insert a break point because you might try to do b *0x58a or something. Compile with -no-pie would make life easier.

Copy function to executable page and call

Im trying to copy a function i have to an executable page and run it from there, but i seem to be having some problems.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <windows.h>
int foo()
{
return 4;
}
int goo()
{
return 5;
}
int main()
{
int foosize = (int)&goo-(int)&foo;
char* buf = VirtualAlloc(NULL, foosize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (buf == NULL)
{
printf("Failed\n");
return 1;
}
printf("foo %x goo %x size foo %d\n", &foo, &goo, foosize);
memcpy (buf, (void*)&foo, foosize);
int(*f)() = &foo;
int ret1 = f();
printf("ret 1 %d\n", ret1);
int(*f2)() = (int(*)())&buf;
int ret2 = f2 (); // <-- crashes here
printf("ret2 %d\n", ret2);
return 0;
}
I know some of the code is technically UB ((int)&goo-(int)&foo), but it behaves fine in this case.
My question is why is this not working as expected?
It seems to me i mapped a page as executable and copied an existing function there and im just calling it.
What am i missing?
Would this behave differently on linux with mmap?
Thanks in advance
As everyone has already stated in comments, this is totally undefined behavior and should never really expect to work. However, I played with your code some with the debugger and realized the reason it's not working (at least in Cygwin gcc compiler) is you're creating f2 incorrectly to point to the the address of the pointer storing the allocated memory, namely buf. You want to point to the memory that buf points to. Therefore, your assignment should be
int(*f2)() = (int(*)())buf;
With that change, your code executes for me. But even if it works, it might break again as soon as you make any additional changes to the program.
Well I made a try of your code with MVSC 2008 in debug mode. Compiler happens to create a jmp table with relative offsets, and &foo and &goo are just entries in that table.
So even if you have successfully created an executable buffer and copied the code (much more than was useful...) the relative jump now points to a different location and (in my example) soon fell in a int 3 trap!
TL/DR: as compiler can arrange its code at will, and as many jump use relative offsets, you cannot rely on copying executable code. It is really Undefined Behaviour:
if compiler had been smart enough to just generate something like :
mov AX, 4
ret
it could have worked
if compiler has generated more complicated code with a relative jump it just breaks
Conclusion: you can only copy executable code if you have full control on the binary machine code for example if you used assembly code and know you will have no relocation problem
You need to declare foo and goo as static or will have to disable Incremental Linking.
Incremental linking is used to shorten the linking time when building your applications, the difference between normally and incrementally linked executables is that in incrementally linked ones each function call goes through an extra JMP instruction emitted by the linker.
These JMPs allow the linker to move the functions around in memory without updating all the CALL instructions that reference the function. But it's exactly this JMP that causes problems in your case. Declaring a function as static prevents the linker from creating this extra JMP instruction.

How to find return instruction in memory

I have some C code, that calls a function. I'm compiling this code in visual studio on Windows. Is there a straightforward way to view the return instruction (opcode) and the return adress?
I tried to use the memory window in Visual Studio, but I only see my buffer "blie" and some hexadecimal interpreted memory values. I think CC might be an opcode but I'd like to have a way/software to clearly view the return instruction and the return adress.
#include <stdio.h>
#include <stdlib.h>
int foo(char *);
int main(int argc, char *argv[])
{
if (argc != 1)
return printf("Supply an argument, dude\n");
foo(argv[0]);
return 0;
}
int foo(char *input)
{
unsigned char buffer[600] = "";
printf("Adres: %.8X\n", &buffer);
strcpy(buffer, input);
return 0;
}
The return address is located on the stack memory region (pointed to by the rsp register, assuming your are on x86_64), while the code that performs the function return is located in the code memory region. If you want to see the return address, stop your process on the RET instruction and look at the top of the stack.
If you only want to look at the generated code you can use a disassembler. As you are using Windows you can try the open source x64dbg. Other options exist, such as IDA Pro and you can view a list of others in this question: https://reverseengineering.stackexchange.com/questions/1817/is-there-any-disassembler-to-rival-ida-pro
Documentation excerpt:
The RET instruction transfers program control from the procedure currently being
executed (the called procedure) back to the procedure that called it (the
calling procedure). Transfer of control is accomplished by copying the return
instruction pointer from the stack into the EIP register.
As you can see return address is on the stack so you cannot see that in disassembly.
Regarding finding return instruction - not easy. Most probably you use x86 cpu which is CISC wich has variable length opcodes (in comparison to RISC). This means that in order to find any opcode you must first 'find' all prior to it.
BTW: You can see disassembly of your code in VS.

In x86 assembly, why would reading standard input block the program?

I'm a newbie to x86 assembly (Intel syntax) and have been playing around with some simple instructions using inline GCC. I have successfully managed to do manipulation of numbers and control flow and am now tackling standard input and output using interrupts. I am using Mac OS X and forcing compilation for 32-bit using the -m32 GCC flag.
I have the following for printing a string to standard output:
char* str = "Hello, World!\n";
int strLen = strlen(str);
asm
{
mov eax, 4
push strLen
push str
push 1
push eax
int 0x80
add esp, 16
}
When compiled and run this prints Hello, World! to the console! However, when I try to do some reading from standard input, things don't work as well:
char* str = (char*)malloc(sizeof(char) * 16);
printf("Please enter your name: ");
asm
{
mov eax, 3
push 16
push str
push 0
push eax
int 0x80
add esp, 16
}
printf("Hello, %s!\n", str);
When run, I get a prompt, but without the "Please enter your name: " string. When I enter some input and hit Enter, the entry string is printed as well as the expected output, e.g.
Please enter your name: Hello, Joe Bloggs
!
How do I get the entry string to appear in the expected location, before the user enters any input?
printf writes using stdio, which does buffering (i.e., what's written doesn't get output straight away). You need to call fflush(stdout) first, before you send your syscall to read (since syscalls bypass stdio and knows nothing about buffers).
Also, as Kerrek SB has noted, your asm does not have a clobber list and it's not volatile. That means that gcc is free to relocate your assembly code elsewhere in the function (since it's free to assume your assembly code has no side effects), which may have a different effect from what you expect. I recommend you use asm volatile.

What does this mean?: *(int32 *) 0 = 0;

In the following piece of code, what does *(int32 *) 0 = 0; mean?
void
function (void)
{
...
for (;;)
*(int32 *) 0 = 0; /* What does this line do? */
}
A few notes:
The code seems to not be reachable, as there is an exit statement before that particular piece of code.
int32 is typedef'ed but you shouldn't care too much about it.
This piece of code is from a language's runtime in a compiler, for anyone interested.
The code is doing the following:
for (;;) // while(true)
*(int32 *) 0 = 0; // Treat 0 as an address, de-reference the 0 address and try and store 0 into it.
This should segfault, null pointer de-reference.
EDIT
Compiled and ran for further information:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(void){
*(int32_t *) 0 = 0;
printf("done\n");
return 0;
}
gcc -g null.c; ./a.out
Program received signal SIGSEGV, Segmentation fault.
0x00000000004004cd in main () at null.c:7
7 *(int32_t *) 0 = 0;
Since the OP states the code was written by experienced compiler engineers, it is possible this is the intent of the code:
*(int32 *) 0 = 0; is recognized by this specific C implementation as code that causes behavior not defined by the C standard and known to this implementation to be illegal.
The for (;;) additionally indicates that this code is never exited.
The compiler engineers know that the optimizer will recognize this code and deduce that it may be “optimized away”, because any program that reaches this code is permitted to have any behavior, so the optimizer may choose to give it the behavior as if the code is never reached.1
This sort of reasoning is possible only if you have specific knowledge of the internal operation of a C implementation. It is the sort of thing a compiler engineer might include in special headers for a C implementation, perhaps to mark that certain code (such as code after an abort call) is never reached. It should never be used in normal programming.
1 For example, consider this code:
if (a)
for (;;)
*(int 32 *) 0 = 0;
else
foo();
The compiler can recognize that the then-clause is permitted to have any behavior. Therefore, the compiler is free to choose what behavior it has. For simplicity, it chooses it to have the same behavior as foo();. Then the code becomes:
if (a)
foo();
else
foo();
and can be further simplified to:
foo();
In fact that this code seg-faulting doesn't explain why it's exists =)
I think that's from runtime of some MCU.. and reason it is there because if program execution will get to this point such instruction will either initiate software reset for an MCU, so program will be restarted (which is common practice in embedded development) OR if MCU configured with hardware watchdog, force MCU restart because of hardware watchdog and never ending loop.
Main goal of such constructions to invoke an interrupt which can be handled either by OS or by hardware for initiate certain actions.
Knowing that its x86 it will depend on a CPU mode... in Real Mode nothing will really happened instantly if there is no watchdog, at address 0 there is an address of 'divide by 0' handler, so if it's some old MS-DOS or embedded x86 runtime it will change an address of the 'Divide by 0' handler to 0, so as soon as it happens and this interrupt is not masked CPU will jump to location 0:0 and probably will just restart because of illegal instruction.. if it's protected or VM x86 code then it's a way to notify OS or any other supervisor that there is a problem in runtime and software should be 'killed' externally.
for(;;) is equivalent to while(1),
*(int32 *) 0 = 0;writes 0 to a dereferenced null pointer, which is expected to cause a crash, but actually won't at all times on certain compilers: Crashing threads with *(int*)NULL = 1; problematic?
It's an infinite loop of undefined behavior (dereferencing a null pointer). It's likely to crash with a segfault on *n*x or Access Violation on Windows.
Mike's comment is pretty well correct: it's storing the VALUE zero at the ADDRESS 0.
Which will be a crash on most machines.
The original IBM PC stored the interrupt vector table in the lowest 1 KiB of memory. Hence actually writing a 32-bit value to the address 0 on such an architecture would overwrite the address for INT 00h. INT 00h looks unused in the PC.
On basically anything modern (meaning in x86/x86-64 parlace anything running in protected or long mode), it will trigger a segmentation fault unless you are in ring 0 (kernel mode) because you are stepping outside of your process' allowed address dereference range.
As the dereference is undefined behavior (as already stated), a segmentation fault is a perfectly acceptable way to handle that situation. If you know that on the target architecture a zero address dereference causes a segmentation fault, it's seems to be a pretty sure way to get the application to crash. If exit() returns, that's probably what you want to do, since something just went horribly wrong. That the code is from a particular compiler's runtime means whoever wrote it can take advantage of knowledge of the internal workings of the compiler and runtime, as well as tailor it to the specific target architecture's behavior.
It could be that the compiler doesn't know exit() doesn't return, but it does know this construct does not return.

Resources