I tried to get idt address in my driver, I made function in asm which returns what idtr contains:
.data
myData dq 0
.code
Function PROC
sidt myData
mov rax, myData
ret
Function ENDP
END
But the address which I get is weird, for example in windbg:
r idtr
idtr=fffff80000b95080
However my driver shows:
idtr = f80000b950800fff
I read that on x64 IDTR contains 64-bit base address of IDT table. I would appreciate if anyone explain why my output is not the same as from WinDbg.
This is what the Intel docs say about the SIDT instruction:
In 64-bit mode, the operand size is fixed at 8+2 bytes. The instruction stores 8-byte base and 2-byte limit values.
and:
DEST[0:15] <- IDTR(Limit);
DEST[16:79] <- IDTR(Base);
This means your myData variable needs to be 10 bytes long, and the instructions stores the limit in the first 2 bytes and base address in the next 8 bytes. This also explains why your value matches with WinDbg's value after the first ffff bytes.
Related
I am learning nasm. I have written a simple function that copies memory from the source to the destination. I test in in C.
section .text
global _myMemcpy
_myMemcpy:
mov eax, [esp + 4]
mov ecx, [esp + 8]
add [esp + 12], eax
lp:
mov dl, [ecx]
mov [eax], dl
inc eax
inc ecx
cmp eax, [esp + 12]
jl lp
endlp:
mov eax, [esp + 4]
ret
And the C program:
#include <string.h>
#define Times 340000000
extern void* _myMemcpy(void* dest, void* src, size_t size);
char sr[Times];
char ds[Times];
int main(void)
{
memset(sr, 'a', Times);
_myMemcpy(ds, sr, Times);
return 0;
}
I am currently using Ubuntu OS. When I compile and link the two files with $ nasm -f elf m.asm && gcc -Wall -m32 m.o p.c && ./a.out it works fine when the value of Times is less than 340000000. When it is greater, _myMemcpy copies only the furst byte of the source to the destination. I can't figure out where is the problem. Every suggestion will by useful.
You're doing signed compares on pointers; don't do that. Use jne in this case since you will always reach exact equality at the exit point.
Or if you want relational compares with pointers, usually unsigned conditions like jb and jae make the most sense. (It's normal to think of virtual address space as a flat linear 4GiB with the lowest address being 0, so you need increments across the middle of that range to work).
With arrays larger than your ~300MiB size, and the default linker script for PIE executables, apparently one of them will span the 2GiB boundary between signed-positive and signed-negative1. So the end-pointer you calculate will be "negative" if you treat it as a signed integer. (Unlike on x86-64, where the non-canonical "hole" spanning the middle of virtual address-space means that an array can never span the signed-wraparound boundary: Should pointer comparisons be signed or unsigned in 64-bit x86? - sometimes it does make sense to use signed compares there.)
You should see this with a debugger if you single-step and look at the pointer values, and the memory value you create with size += dest (add [esp + 12], eax). As a signed operation, that overflows to create a negative end_pointer, while the start pointer is still positive. pos < neg is false on the first iteration, so your loop exits, you can see this when single-stepping.
Footnote 1: On my system, under GDB (which disables ASLR), after start to get the executable mapped to Linux's default base address for PIEs (2/3 of the way into the low half of the address space, i.e. 0x5555...), I checked the addresses with your test case:
sr at 0x56559040
ds at 0x6a998d40
end of ds at p /x sizeof(ds) + ds = 0x7edd8a40
So if it were much bigger, it would cross 0x80000000. That's why 340000000 avoids your bug but larger sizes reveal it.
BTW, under a 32-bit kernel, Linux defaults to a 3:1 split of address space between kernel and user-space, so even there it's possible for this to happen. But under a 64-bit kernel, 32-bit processes can have the entire 4 GiB address space to themselves. (Except for a page or two reserved by the kernel: see also Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?. That also means that forming a pointer to one-past-end of any array like you're doing (which ISO C promises is valid to do), won't wrap around and will still compare above a pointer into the object.)
This won't happen in 64-bit mode: there's enough address space to just divide it evenly between user and kernel, as well as there being a giant non-canonical hole between high and low ranges.
I have the following src:
1 #include<stdio.h>
2
3 int main(void) {
4 int i= 1337; // breakpoint after this value is assigned
!5 return 0;
6 }
In the asm from gdb I get:
!0x00000000004004f1 main+4 movl $0x539,-0x4(%rbp)
And I verified that $0x539 = 1337. How can I see the memory address where the value 1337 is stored? The value of the rbp memory address shows:
rbp 0x00007fffffffeb20
My thought was the rbp register would show the value 0x539, so where would I be able to find that in gdb (what command to use, etc)?
One interesting things I found was in doing:
>>> print i
$16 = 1337
>>> print &i
$17 = (int *) 0x7fffffffeb1c # how is this arrived at?
0x00007fffffffeb20 - 0x4 == 0x7fffffffeb1c
on x86 almost all constants will be addressed as a relative offset from a register. In this case the register is rbp [the frame address], and the relative offset is -4 bytes. i.e. the constant appears prior to the first instruction in main.
x64 addressing modes typically involve one of 3 possibilities:
a zero byte offset from a register address
a signed 8bit offset from a register address
a signed 32bit offset from a register address
(there is a 4th addressing mode, which is to load the value from a register - just for completeness!). In general, a compiler would prefer to emit those modes in the order I have listed them above (because they result in the Op code + an offset which will be either: 0bytes, 1byte, or 4bytes respectively - so the smaller the offset, the smaller the generated machine code will be).
So I was trying to dump the contents of the Interrupt Vector Table on 32 bit Widows 7 using the following code excerpt. It does not compile with Visual Studio as Visual Studio has probably withdrawn support for 16 Bit compilation. I built it in Pelles C, however the executable would crash when I try to run it. The problem, as I figured from some research over the internet, has to do with the 16 bit register reference (to ES). I do not however clearly understand the issue. I would really appreciate if someone could help me out with getting this to work on win32
#include <stdio.h>
#define WORD unsigned short
#define IDT_001_ADDR 0 // start address of the first IVT vector
#define IDT_255_ADDR 1020 // start address of the last IVT vector
#define IDT_VECTOR_SZ 4 // size of the each IVT vector
int main(int argc, char **argv) {
WORD csAddr; // code segment of given interrupt
WORD ipAddr; // starting IP for given interrupt
short address; // address in memory (0-1020)
WORD vector ; // IVT entry ID (0..255)
vector = 0x0;
printf("n-- -Dumping IVT from bottom up ---n");
printf("Vector\tAddresst\n");
for(address=IDT_001_ADDR; address<=IDT_255_ADDR; address=address+IDT_VECTOR_SZ,vector++) {
printf("%03d\t%08d\t", vector , address);
// IVT starts at bottom of memory, so CS is always 0x0
__asm {
PUSH ES
mov AX, 0
mov ES,AX
mov BX, address
mov AX, ES:[BX]
mov ipAddr ,AX
inc BX
inc BX
mov AX, ES:[BX]
mov csAddr, AX
pop ES
};
printf("[CS:IP] = [%04X,%04X]n" ,csAddr, ipAddr);
}
}
Thanks in advance
The issue with es (or any segment register) is that in real mode (which your dos is "faking" with vm86), the value in the segment register is multiplied by 16 and added to the offset to get a linear address - which is the physical address. In protected mode (your win32) the segment registers are "selectors", an index into an array of structures (descriptors) containing (among other things) a "base" which is added to an offset to get a linear address. The value zero is explicitly the invalid selector, so it crashes. The good news is that the "base" of most segment registers (fs an exception) is zero, so you can address the memory you want without touching es.
The bad news is virtual memory. Paging is enabled, so the linear address calculated by base + offset may not be a physical address. If you're lucky, your OS may have kindly "identity mapped" low memory, so that linear memory equals physical memory. If you're really lucky, your OS may let you at it from user code.
Try removing all references to es and see what happens. The results, if any, would be more recogizable in hex (%x), not decimal. Your best bet might be to do the whole thing in 16-bit and forget win32.
I want to ask how can I simulate C pointers in 16 bit assembly.
int var = 10;
int * ptr = &var;
In assembly it's like
mov dword ptr [ebp-x], 10
lea eax, dword ptr [ebp-x]
mov dword ptr [ebp-x+4], eax
Is there any way how to get physical address of variable on [bp-x] in 16 assembly.
For example:
I have program which reads sector from floppy, then it jumps to segment:0 and executes it.
Program which is being loaded is simple text editor. In editor I need to get physical address of single variable, convert it to segment:offset and use it for loading text file. I have tried to set DS:SI before jump to exitor, but It's not verygood solution. Does anybody know how can solve it? Please help.
In the real addressing mode the physical address of a byte of memory is equal to the segment * 16 + offset.
When you refer to memory via [(e)bp+...] or [esp+...], the default segment involved is ss. Otherwise it's ds. An optional segment override prefix will change the default segment register.
So, for example, if your variable is addressed as [bp-8], then its physical address is ss*16+bp-8.
So this is your requirement:-
mov word ptr [bp-x], 10
lea ax, word ptr [bp-x]
mov word ptr [bp-x+4], ax
You can use some old compiler ,probably that beautiful TCC (Turbo C Compiler, 16 bit).
And that will output what you need.
Further even if you will see a 16 bit pointer, its just virtual , and its real address will be translated as per the architecture (like even 32 bit OS run in compatibility mode on an architecture that is 64 bit).
However if you are really very interested doing these kind of stuff, just open cmd -->type debug --> then a -->and you can write a little bit of assembly there.
I'm working here with binary obfuscation, so a got a buffer that is filled with op-codes, and I'm using Linux, so, all function calls uses the same caller/callee conventions and no problem here.
My question is about the E8 opcode, This opcode takes near calls using relative address.
My question is that: I know the address where call comes from, I know the address where I have to call, so, how can I find the shift address that I must put in the E8 call? This is:
signed long src = (signed long)buffer + shift; //get the position where E8 instruction is
signed long dst = (signed long)srand; //get the destination position where i want to call (yes, srand(long) function in this case.)
So in my buffer I have:
buffer[] = "[....]\xE8\xFF\xFA\xFE\x54[.....]"; //example
I need to replace with a valid pointer to srand, how can I get the relative address from what I have?
I just thought that i can use FF instruction to call direct, but I couldn't figure out how to do this. I cant copy the address to (say) $eax because I can't put more op-codes than 5 in the replacement (it will make all jmp calls above go bananas), and I can't understand if there is a way to make a direct call in 5 bytes.
So if somebody know how to get the right value to replace the E8 relative shifting address, or if there is a way to make some sort of direct call keeping the same functional properties as E8 call and just using 5 bytes...
(Before ask, I tried to put FF XX XX XX XX as XX being the real address and it didn;t work, the x86 doesn't looks it like a call, it interpret as a INC (???) and some random thing after. I tried replace in this way:
inline void endian_swap(long& x) {
x = (x>>24) |
((x<<8) & 0x00FF0000) |
((x>>8) & 0x0000FF00) |
(x<<24);
}
endian_swap(dst);
endian_swap(src);
unsigned int p = dst - src;
endian_swap(p);
And put the address that I found to E8 call. It didn't work anyway.
The relative addresses in the near call and jxx/near jmp instructions equals the target address where you want to transfer control minus the address of the instruction immediately following your call or jump instruction. Relative addresses are relative to the address of the next instruction, not the one that's transferring control. IOW, you have to take into account the length of your call or jump instruction if its address operand is relative.
Generally there's no equivalent to a call or jump instruction that's 5 bytes or shorter.
You can simulate jmp as push target address + ret, but in 32-bit mode with arbitrary target addresses you get at least 1+4+1=6 bytes for those 2 instructions. You can simulate call in the same way, but you will have add another push or call instruction to place the return address on the stack. So, to those 6 bytes you add 5 more.
There's an "absolute" version of "jmp" (and IIRC "call" as well) that takes the address operand as an immediate consisting of the target offset and target segment. Such an instruction will be at least 1+4+2=7 bytes long (4 bytes for offset, 2 bytes for segment selector).
If you use a variant of call or jmp that takes the target address from a specified memory location (e.g. call [ebx]), that instruction is going to be at least 1+1=2 bytes long (opcode + ModR/M byte), but you'll have to load a register with the address of that memory location containing the target address and that'll cost you some other 1+4=5 bytes, giving you at least 7 bytes. There's also a variant that allows you to specify the target address in a register (e.g. jmp ebx), but again, because of having to load the register, you come out at at least 7 bytes.
The only way you can make your call/jump instruction shorter is when the target address is very close to the address of that instruction (in which case you can use either a rel16 form (with the appropriate operand or address (I don't remember which one) override prefix) or a rel8 form if available) OR when the target address is small (in which case push target address can be either the shorter push Ib or the shorter operand size prefix + push Iw).
I solved it by doing:
long dst = (long)srand;
long src = ((long)buffer) + shift + 5; //begin of buffer + actual position + this instruction size
long p = dst - src;
p = htonl(p);
Than I replace the call on the buffer and everything works well.