Loading address in 16 bit mode - c

I want to ask how can I simulate C pointers in 16 bit assembly.
int var = 10;
int * ptr = &var;
In assembly it's like
mov dword ptr [ebp-x], 10
lea eax, dword ptr [ebp-x]
mov dword ptr [ebp-x+4], eax
Is there any way how to get physical address of variable on [bp-x] in 16 assembly.
For example:
I have program which reads sector from floppy, then it jumps to segment:0 and executes it.
Program which is being loaded is simple text editor. In editor I need to get physical address of single variable, convert it to segment:offset and use it for loading text file. I have tried to set DS:SI before jump to exitor, but It's not verygood solution. Does anybody know how can solve it? Please help.

In the real addressing mode the physical address of a byte of memory is equal to the segment * 16 + offset.
When you refer to memory via [(e)bp+...] or [esp+...], the default segment involved is ss. Otherwise it's ds. An optional segment override prefix will change the default segment register.
So, for example, if your variable is addressed as [bp-8], then its physical address is ss*16+bp-8.

So this is your requirement:-
mov word ptr [bp-x], 10
lea ax, word ptr [bp-x]
mov word ptr [bp-x+4], ax
You can use some old compiler ,probably that beautiful TCC (Turbo C Compiler, 16 bit).
And that will output what you need.
Further even if you will see a 16 bit pointer, its just virtual , and its real address will be translated as per the architecture (like even 32 bit OS run in compatibility mode on an architecture that is 64 bit).
However if you are really very interested doing these kind of stuff, just open cmd -->type debug --> then a -->and you can write a little bit of assembly there.

Related

Why does the stack frame also store instructions(besides data)? What is the precise mechanism by which instructions on stack frame get executed?

Short version:
0: 48 c7 c7 ee 4f 37 45 mov $0x45374fee, %rdi
7: 68 60 18 40 00 pushq $0x401860
c: c3 retq
How can these 3 lines of instruction(0,7,c), saved in the stack frame, get executed? I thought stack frame only store data, does it also store instructions? I know data is read to registers, but how do these instructions get executed?
Long version:
I am self-studying 15-213(Computer Systems) from CMU. In the Attack lab, there is an instance (phase 2) where the stack frame gets overwritten with "attack" instructions. The attack happens by then overwriting the return address from the calling function getbuf() with the address %rsp points to, which I know is the top of the stack frame. In this case, the top of the stack frame is in turn injected with the attack code mentioned above.
Here is the question, by reading the book(CSAPP), I get the sense that the stack frame only stores data the is overflown from the registers(including return address, extra arguments, etc.). But I don't get why it can also store instructions(attack code) and be executed. How exactly did the content in the stack frame, which %rsp points to, get executed? I also know that %rsp stores the return address of the calling function, the point being it is an address, not an instruction? So exactly by which mechanism does an supposed address get executed as an instruction? I am very confused.
Edit: Here is a link to the question(4.2 level 2):
http://csapp.cs.cmu.edu/3e/attacklab.pdf
This is a post that is helpful for me in understanding: https://github.com/magna25/Attack-Lab/blob/master/Phase%202.md
Thanks for your explanation!
ret instruction gets a pointer from the current position of the stack and jumps to it. If, while in a function, you modify the stack to point to another function or piece of code that could be used maliciously, the code can return to it.
The code below doesn't necessarily compile, and it is just meant to represent the concept.
For example, we have two functions: add(), and badcode():
int add(int a, int b)
{
return a + b;
}
void badcode()
{
// Some very bad code
}
Let's also assume that we have a stack such as the below when we call add()
...
0x00....18 extra arguments
0x00....10 return address
0x00....08 saved RBP
0x00....00 local variables and etc.
...
If during the execution of add, we managed to change the return address to address of badcode(), on ret instruction we will automatically start executing badcode(). I don't know if this answer your question.
Edit:
An instruction is simply an array of numbers. Where you store them is irrelevant (mostly) to their execution. A stack is essentially an abstract data structure, it is not a special place in RAM. If your OS doesn't mark the stack as non-executable, there is nothing stopping the code on the stack from being returned to by the ret.
Edit 2:
I get the sense that the stack frame only stores data that is overflown
from the registers(including return address, extra arguments, etc.)
I do not think that you know how registers, RAM, stack, and programs are incorporated. The sense that stack frame only stores data that is overflown is incorrect.
Let's start over.
Registers are pieces of memory on your CPU. They are independent of RAM. There are mainly 8 registers on a CPU. a, c, d, b, si, di, sp, and bp. a is for accumulator and it generally used for arithmetic operations, likewise b stands for base, c stands for counter, d stands for data, si stands for source, di stands for destination, sp is the stack pointer, and bp is the base pointer.
On 16 bit computers a, b, c, d, si, di, sp, and bp are 16 bits (2 byte). The a, b, c, and d are often shown as ax, bx, cx, and dx where the x stands for extension from their original 8 bit versions. They can also be referred to as eax, ecx, edx, ebx, esi, edi, esp, ebp for 32 bit (e again stands for extended) and rax, rcx, rdx, rbx, rsi, rdi, rsp, rbp for 64 bit.
Once again these are on your CPU and are independent of RAM. CPU uses these registers to do everything that it does. You wanna add two numbers? put one of them inside ax and another one inside cx and add them.
You also have RAM. RAM (standing for Random Access Memory) is a storage device that allows you to access and modify all of its values using equal computation power or time (hence the term random access). Each value that RAM holds also has an address that determines where on the RAM this value is. CPU can use numbers and treat such numbers as addresses to access memory addresses of RAM. Numbers that are used for such purposes are called pointers.
A stack is an abstract data structure. It has a FILO (first in last out) structure which means that to access the first datum that you have stored you have to access all of the other data. To manipulate the stack CPU provides us with sp which holds the pointer to the current position of the stack, and bp which holds the top of the stack. The position that bp holds is called the top of the stack because the stack usually grows downwards meaning that if we start a stack from the memory address 0x100 and store 4 bytes in it, sp will now be at the memory address 0x100 - 4 = 0x9C. To do such operations automatically we have the push and pop instructions. In that sense a stack could be used to store any type of data regardless of the data's relation to registers are programs.
Programs are pieces of structured code that are placed on the RAM by the operating system. The operating system reads program headers and relevant information and sets up an environment for the program to run on. For each program a stack is set up, usually, some space for the heap is given, and instructions (which are the building blocks of a program) are placed in arbitrary memory locations that are either predetermined by the program itself or automatically given by the OS.
Over the years some conventions have been set to standardize CPUs. For example, on most CPU's ret instruction receives the system pointer size amount of data from the stack and jumps to it. Jumping means executing code at a particular RAM address. This is only a convention and has no relation to being overflown from registers and etc. For that reason when a function is called firstly the return address (or the current address in the program at the time of execution) is pushed onto the stack so that it could be retrieved later by ret. Local variables are also stored in the stack, along with arguments if a function has more than 6(?).
Does this help?
I know it is a long read but I couldn't be sure on what you know and what you don't know.
Yet Another Edit:
Lets also take a look at the code from the PDF:
void test()
{
int val;
val = getbuf();
printf("No exploit. Getbuf returned 0x%x\n", val);
}
Phase 2 involves injecting a small amount of code as part of your exploit string.
Within the file ctarget there is code for a function touch2 having the following C representation:
void touch2(unsigned val)
{
vlevel = 2; /* Part of validation protocol */
if (val == cookie) {
printf("Touch2!: You called touch2(0x%.8x)\n", val);
validate(2);
} else {
printf("Misfire: You called touch2(0x%.8x)\n", val);
fail(2);
}
exit(0);
}
Your task is to get CTARGET to execute the code for touch2 rather than returning to test. In this case,
however, you must make it appear to touch2 as if you have passed your cookie as its argument.
Let's think about what you need to do:
You need to modify the stack of test() so that two things happen. The first thing is that you do not return to test() but you rather return to touch2. The other thing you need to do is give touch2 an argument which is your cookie. Since you are giving only one argument you don't need to modify the stack for the argument at all. The first argument is stored on rdi as a part of x86_64 calling convention.
The final code that you write has to change the return address to touch2()'s address and also call mov rdi, cookie
Edit:
I before talked about RAM being able to store data on addresses and CPU being able to interact with them. There is a secret register on your CPU that you are not able to reach from you assembly code. This register is called ip/eip/rip. It stands for instruction pointer. This register holds a 16/32/64 bit pointer to an address on RAM. this particular address is the address that the CPU will execute in its clock cycle. With that in my we can say that what a ret instruction is doing is
pop rip
which means get the last 64 bits (8 bytes for a pointer) on the stack into this instruction pointer. Once rip is set to this value, the CPU begins executing this code. The CPU doesn't do any checks on rip whatsoever. You can technically do the following thing (excuse me, my assembly is in intel syntax):
mov rax, str ; move the RAM address of "str" into rax
push rax ; push rax into stack
ret ; return to the last pushed qword (8 bytes) on the stack
str: db "Hello, world!", 0 ; define a string
This code can call/execute a string. Your CPU will be very upset tho, that there is no valid instruction there and will probably stop working.

Why my own memcpy written on NASM can not copy more than 340000000 bytes?

I am learning nasm. I have written a simple function that copies memory from the source to the destination. I test in in C.
section .text
global _myMemcpy
_myMemcpy:
mov eax, [esp + 4]
mov ecx, [esp + 8]
add [esp + 12], eax
lp:
mov dl, [ecx]
mov [eax], dl
inc eax
inc ecx
cmp eax, [esp + 12]
jl lp
endlp:
mov eax, [esp + 4]
ret
And the C program:
#include <string.h>
#define Times 340000000
extern void* _myMemcpy(void* dest, void* src, size_t size);
char sr[Times];
char ds[Times];
int main(void)
{
memset(sr, 'a', Times);
_myMemcpy(ds, sr, Times);
return 0;
}
I am currently using Ubuntu OS. When I compile and link the two files with $ nasm -f elf m.asm && gcc -Wall -m32 m.o p.c && ./a.out it works fine when the value of Times is less than 340000000. When it is greater, _myMemcpy copies only the furst byte of the source to the destination. I can't figure out where is the problem. Every suggestion will by useful.
You're doing signed compares on pointers; don't do that. Use jne in this case since you will always reach exact equality at the exit point.
Or if you want relational compares with pointers, usually unsigned conditions like jb and jae make the most sense. (It's normal to think of virtual address space as a flat linear 4GiB with the lowest address being 0, so you need increments across the middle of that range to work).
With arrays larger than your ~300MiB size, and the default linker script for PIE executables, apparently one of them will span the 2GiB boundary between signed-positive and signed-negative1. So the end-pointer you calculate will be "negative" if you treat it as a signed integer. (Unlike on x86-64, where the non-canonical "hole" spanning the middle of virtual address-space means that an array can never span the signed-wraparound boundary: Should pointer comparisons be signed or unsigned in 64-bit x86? - sometimes it does make sense to use signed compares there.)
You should see this with a debugger if you single-step and look at the pointer values, and the memory value you create with size += dest (add [esp + 12], eax). As a signed operation, that overflows to create a negative end_pointer, while the start pointer is still positive. pos < neg is false on the first iteration, so your loop exits, you can see this when single-stepping.
Footnote 1: On my system, under GDB (which disables ASLR), after start to get the executable mapped to Linux's default base address for PIEs (2/3 of the way into the low half of the address space, i.e. 0x5555...), I checked the addresses with your test case:
sr at 0x56559040
ds at 0x6a998d40
end of ds at p /x sizeof(ds) + ds = 0x7edd8a40
So if it were much bigger, it would cross 0x80000000. That's why 340000000 avoids your bug but larger sizes reveal it.
BTW, under a 32-bit kernel, Linux defaults to a 3:1 split of address space between kernel and user-space, so even there it's possible for this to happen. But under a 64-bit kernel, 32-bit processes can have the entire 4 GiB address space to themselves. (Except for a page or two reserved by the kernel: see also Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?. That also means that forming a pointer to one-past-end of any array like you're doing (which ISO C promises is valid to do), won't wrap around and will still compare above a pointer into the object.)
This won't happen in 64-bit mode: there's enough address space to just divide it evenly between user and kernel, as well as there being a giant non-canonical hole between high and low ranges.

Don't really understand how arrays work in Assembly

I've just started learning Assembly and I got stuck now...
%include 'io.inc'
global main
section .text
main:
; read a
mov eax, str_a
call io_writestr
call io_readint
mov [nb_array], eax
call io_writeln
; read b
mov eax, str_b
call io_writestr
call io_readint
mov [nb_array + 2], eax
call io_writeln
mov eax, [nb_array]
call io_writeint
call io_writeln
mov eax, [nb_array + 2]
call io_writeint
section .data
nb_array dw 0, 0
str_a db 'a = ', 0
str_b db 'b = ', 0
So, I have a 2 elem sized array and when I try to print the first element, it doesn't print the right value. Although I try to print the second element, it prints the right value. Could someone help me understand why is this happening?
The best answer is probably "because there are no arrays in Assembly". You have computer memory available, which is addressable by bytes. And you have several instructions to manipulate those bytes, either by single byte, or by groups of them forming "word" (two bytes) or "dword" (four bytes), or even more (depends on platform and extended instructions you use).
To use the memory in any "structured" way in Assembly: it's up to you to write piece of code like that, and it takes some practice to be accurate enough and to spot all bugs in debugger (as just running the code with correct output doesn't mean much, if you would do only single value input, your program would output correct number, but the "a = " would be destroyed anyway - you should rather every new piece of code walk instruction by instruction in debugger and verify everything works as expected).
Bugs in similar code were so common, that people rather used much worse machine code produced by C compiler, as the struct and C arrays were much easier to use, not having to guard by_size multiplication of every index, and allocating correct amount of memory for every element.
What you see as result is exactly what you did with the memory and particular bytes (fix depends whether you want it to work for 16b or 32b input numbers, you either have to fix instructions storing/reading the array to work with 16b only, or fix the array allocation and offsets to accompany two 32b values).

Kernel Dev: Setting ES:DI in real mode

I'm working on a toy kernel for fun and education (not a class project). I'm starting work on my memory manager, so I'm trying to get the memory map from BIOS using an INT 0x15, EAX=E820 call while still in Real Mode. I'm adapting my function from the osdev wiki (here, in the section "Getting an E820 Memory Map"). However, I want this to be a function I can call from my C code, so I'm trying to change it a bit. I want it to take two arguments: a pointer to where to store the map entries, and a pointer to an integer which will be incremented by the number of entries in the table.
According to the wiki, ES:DI needs to be pointing at where the data should be stored, so I split my first argument into two (the segment selector, pointer_to_map / 16, and the offset, pointer_to_map % 16). Here's part of C code:
typedef struct SMAP_entry {
unsigned int baseL; // Base address, a QWORD
unsigned int baseH;
unsigned int lengthL; // Length, a QWORD
unsigned int lengthH;
unsigned int type; // entry type
unsigned int ACPI; // extra data from ACPI 3.0
} SMAP_entry_t;
SMAP_entry_t data[100];
kprint("Pointer: ");
kprint_int((int) data, 16);
kprint_newline();
int res = 0;
read_mem_map(((int) data) / 16, ((int) data) % 16, &res);
kprint("res: ");
kprint_int(res, 16);
kprint_newline();
Here's part of my ASM code:
; performs a INT 0x15, eax=0xE820 call to find the memory map
; inputs: the pointer to the data table / 16, the pointer % 16, a pointer to an dword (int) which will be
; incremented by the number of entries after this function returns.
; preserves: no registers except esi
read_mem_map:
mov es, [esp + 4] ; set es to the value of the first argument
mov di, [esp + 8] ; set di to the value of the second argument
That's all I'm pasting in because the program triple-faults and shuts down the VM there. By moving ret commands around, I found that the function crashes on the very first line. If I comment out the call in C, then everything works as you'd expect.
I've read through Google that there's almost never a reason to set ES:DI directly, and in the code that I've found which does, they set it to a literal. How should I set ES:DI and if I shouldn't set it directly, how should I make the C and ASM interact in the correct way?
Each of the segment registers (on 80x86) have a visible part, and several hidden fields (the segment base, the segment limit and the segment's attributes - read/write, privilege level, etc).
In protected mode; when you load a segment register the CPU uses the visible part as an index into either the GDT or LDT, and loads the segment's hidden fields from that descriptor (in the GDT or LDT).
In real mode; the CPU does something completely different - it only sets the segment base to "visible part * 16" and doesn't use any (GDT, LDT) table.
Given the fact that you're using a 32-bit pointer to the data table and a 32-bit stack pointer (e.g. mov es, [esp + 4]); I assume your C code is in 32-bit protected mode. This is completely incompatible with real mode, partly because segment loads work completely differently and partly because the default operand/address size is 32-bit and not 16-bit.
All BIOS functions are designed for real mode. They can't be used in protected mode.
Basically; I'd recommend:
pass the pointer to the data table to your assembly as a 32-bit integer/pointer (and not 2 separate 16-bit integers)
call a "go to real mode" function (which will be slightly tricky, as you'd also be switching from a 32-bit stack to a 16-bit stack and will need a "32-bit return instruction" in 16-bit code).
split the pointer to the data table into its segment and offset in assembly, and load the segment (which should work correctly as you're in real mode now)
call the BIOS function (which should work correctly as you're in real mode now)
call a "go to protected mode" function (which will be slightly tricky again, including a "16-bit return instruction" in 32-bit code).
return to the (32-bit protected mode) caller
Instructions for switching from real mode to protected mode, and switching from protected mode to real mode, are included in Intel's system programmer's guide. :)

Compiler Error C2432 : illegal reference to 16-bit data in 'identifier'

So I was trying to dump the contents of the Interrupt Vector Table on 32 bit Widows 7 using the following code excerpt. It does not compile with Visual Studio as Visual Studio has probably withdrawn support for 16 Bit compilation. I built it in Pelles C, however the executable would crash when I try to run it. The problem, as I figured from some research over the internet, has to do with the 16 bit register reference (to ES). I do not however clearly understand the issue. I would really appreciate if someone could help me out with getting this to work on win32
#include <stdio.h>
#define WORD unsigned short
#define IDT_001_ADDR 0 // start address of the first IVT vector
#define IDT_255_ADDR 1020 // start address of the last IVT vector
#define IDT_VECTOR_SZ 4 // size of the each IVT vector
int main(int argc, char **argv) {
WORD csAddr; // code segment of given interrupt
WORD ipAddr; // starting IP for given interrupt
short address; // address in memory (0-1020)
WORD vector ; // IVT entry ID (0..255)
vector = 0x0;
printf("n-- -Dumping IVT from bottom up ---n");
printf("Vector\tAddresst\n");
for(address=IDT_001_ADDR; address<=IDT_255_ADDR; address=address+IDT_VECTOR_SZ,vector++) {
printf("%03d\t%08d\t", vector , address);
// IVT starts at bottom of memory, so CS is always 0x0
__asm {
PUSH ES
mov AX, 0
mov ES,AX
mov BX, address
mov AX, ES:[BX]
mov ipAddr ,AX
inc BX
inc BX
mov AX, ES:[BX]
mov csAddr, AX
pop ES
};
printf("[CS:IP] = [%04X,%04X]n" ,csAddr, ipAddr);
}
}
Thanks in advance
The issue with es (or any segment register) is that in real mode (which your dos is "faking" with vm86), the value in the segment register is multiplied by 16 and added to the offset to get a linear address - which is the physical address. In protected mode (your win32) the segment registers are "selectors", an index into an array of structures (descriptors) containing (among other things) a "base" which is added to an offset to get a linear address. The value zero is explicitly the invalid selector, so it crashes. The good news is that the "base" of most segment registers (fs an exception) is zero, so you can address the memory you want without touching es.
The bad news is virtual memory. Paging is enabled, so the linear address calculated by base + offset may not be a physical address. If you're lucky, your OS may have kindly "identity mapped" low memory, so that linear memory equals physical memory. If you're really lucky, your OS may let you at it from user code.
Try removing all references to es and see what happens. The results, if any, would be more recogizable in hex (%x), not decimal. Your best bet might be to do the whole thing in 16-bit and forget win32.

Resources