pointer to baseAddress through CONTEXT.Ebx+8 - c

I am trying to implement the known method of "Dynamic Forking of Win32 EXE", which is knows as RunPE.
My problem is that i am can't get the right result of the "base address" as it mentioned in the 3rd point at http://www.security.org.sg/code/loadexe.html
This is my code:
DWORD* peb;
DWORD* baseAddress;
...snip...
GetThreadContext(hTarget, &contx)
peb = (DWORD *) contx.Ebx;
baseAddress = (DWORD *) contx.Ebx+8;
_tprintf(_T("The EBX [PEB] is: 0x%08X\nThe base address is: 0x%08X\nThe Entry Point is: 0x%08X\n"), peb, baseAddress, contx.Eax);
and the output is as follwos:
The EBX [PEB] is: 0x7FFD4000
The base address is: 0x7FFD4020
The Entry Point is: 0x00401000
I think that my problem is with the implementation of my baseAddress pointer, but i can't figure out exactly what is the issue. Or could be that i havn't understand the above article correctly and baseAddress isn't ImageBase, if so what is baseAddress ?
I have tried to run it under Win 7 64b and Win-XP and on both i am get the same incorrect results.

Note that the instructions say "at [EBX+8]". The brackets mean the value at that address location. There are several problems with
baseAddress = (DWORD *) contx.Ebx+8;
First, the compiler doesn't pay attention spacing, only to parenthesizing, so this means
baseAddress = ((DWORD *)contx.Ebx) + 8;
which is wrong because the 8 is counting DWORDs, rather than bytes. You want
baseAddress = (DWORD *)(contx.Ebx + 8);
but this just gets you the address where the baseAddress is stored, not the value of the baseAddress. For that you need
baseAddress = *(DWORD *)(contx.Ebx + 8);
However, this only works if contx.Ebx refers to an address in your process, but every process has its own address space, and you need to access the address space of the suspended process; for that you need to use ReadProcessMemory ( http://msdn.microsoft.com/en-us/library/windows/desktop/ms680553%28v=vs.85%29.aspx ):
ok = ReadProcessMemory(hTarget, (LPCVOID)(contx.Ebx + 8), (LPVOID)&baseAddress, sizeof baseAddress, NULL);

You're just doing pointer arithmetic, you're not actually dereferencing the memory, in this line:
baseAddress = (DWORD *) contx.Ebx+8;
You're just adding 8*sizeof(DWORD) = 32 to the value of contx.Ebx. What you really want to do is read the data at the address of contx.Ebx+8 in the new process's address space. In order to do that, you need to use ReadProcessMemory, and don't bother to cast -- you want to use raw offsets, not offsets multiplied by sizeof(DWORD), which is what happens when you do pointer arithmetic with DWORD* values.
However, I'd strongly caution you against digging into implementation details like this, which can and do change between different versions of Windows. Keep in mind that the article you linked to was written in 2004, and it was just a proof-of-concept, so there are likely to be lots of hidden gotchas and unexpected problems in Vista, Windows 7, Windows 8, and future versions.
The Windows API does not have a function that behaves like Unix's fork(2) function, and as a result, you should try as much as possible to avoid needing to fork -- use CreateProcess instead of fork+exec etc. The Cygwin implementation of fork is ugly and slow, and it can fail unexpectedly due to DLL memory mapping issues.

In addition to what others have said, there is a much simplier way to retreive the base address of the process's PEB structure. Use NtQueryInformationProcess() instead, setting its ProcessInformationClass parameter set to ProcessBasicInformation. The output will be a PROCESS_BASIC_INFORMATION structure:
typedef struct _PROCESS_BASIC_INFORMATION {
PVOID Reserved1;
PPEB PebBaseAddress;
PVOID Reserved2[2];
ULONG_PTR UniqueProcessId;
PVOID Reserved3;
} PROCESS_BASIC_INFORMATION;
The second member is what you are looking for.

Related

Usage of Xilinx built-in UART function #define XUartPs_IsReceiveData (BaseAddress )

So I am trying to use this built-in UART function (from the Vitis SDK from Xilinix) to determine if there is a valid byte to read over UART. I created this function to return 1 if there was a byte to read or 0 if there wasn't
u32 UartHasMessage(void){
if(XUartPs_IsReceiveData(&XUartPs_Main)){
return 1;
}
else{
return 0;
}
}
However, even when there is a byte to read over UART, this function always returns false.
The weird behavior I am experiencing is when I step through the code using the debugger, I call UartHasMessage() to check if there is a byte to read, and it returns false, but in the next line I call a function to read a byte over UART and that contains the correct byte I sent over the host.
u32 test - UartHasMessage();
UartGetByte(&HostReply);
How come this UartHasMessage always returns false, but then in the next line I am able to read the byte correctly?
Caveat: Without some more information, this is a bit speculative and might be a comment, but it is too large for that.
The information below comes from the Xilinx documentation on various pages ...
XUartPs_RecvByte will block until a byte is ready. So, no need to call XUartPs_IsReceiveData directly (I think that XUartPS_RecvByte calls it internally).
A web search on XUartPs_Main came up with nothing, so we'd need to see the definition you have.
Most Xilinx documentation uses UART_BASEADDRESS:
#define UART_BASEADDR XPAR_XUARTPS_0_BASEADDR
I found a definition:
#define XPAR_XUARTPS_0_BASEADDR 0xE0001000
You might be better off using a more standard method, such as calling the XUartPs_LookupConfig function to get the configuration table entry which has all relevant values.
I'm guessing that you created the XUartPS_Main definition.
But, based on what you posted, (needing &XUartPS_Main instead of XUartPS_Main), it is linked/loaded at the exact address of the UART register bank. Let's assume that address is (e.g.) 0x10000. So, we might have:
u32 XUartPS_Main __attribute__(at(0x10000));
The at is an extension that some build systems support (e.g. arm) that forces the variable to be loaded at a given address. So, let's assume we have that (even if the mechanism is slightly different (e.g.):
__attribute__((section(".ARM.__at_0x10000")))
The definition of XUARTPS_SR_OFFSET is:
#define XUARTPS_SR_OFFSET 0x002CU
Offsets are [typically] byte offsets.
Given:
#define XUartPs_IsReceiveData(BaseAddress) \
!((Xil_In32((BaseAddress) + XUARTPS_SR_OFFSET) & \
(u32)XUARTPS_SR_RXEMPTY) == (u32)XUARTPS_SR_RXEMPTY)
Now if the definition of XUartPS_Main uses u32 [as above], we may have a problem because XUARTPS_SR_OFFSET will be treated as a u32 index and not a byte offset. So, it will access the wrong address.
So, try:
XUartPs_IsReceiveData((unsigned char *) &XUartPs_Main)
But, if it were me, I'd rework things to use Xilinx's standard definitions.
UPDATE:
Hi so XUartPs_main is defined as static XUartPs XUartPs_Main; I use it in a variety of functions such as a function to send bytes over uart and I call it by its address like I did with this function, all my other functions work as expected except this one. Is it possible it is something to do with the way the fifo works? –
29belgrade29
No, not all the API functions are the same.
The struct definition is [I synthesized this from the API doc]:
typedef struct {
u16 DeviceId; // Unique ID of device.
u32 BaseAddress; // Base address of device (IPIF)
u32 InputClockHz;
} XUartPs;
Somewhere in your code you had to initialize this with:
XUartPs_Main = XUartPs_ConfigTable[my_device_id];
Or, with:
XUartPs_Main = *XUartPs_LookupConfig(my_device_id);
If an API function is defined as (e.g.):
void api_dosomething(XUartPs_Config *cfg,...)
Then, you call it with:
api_dosomething(&XUartPs_Main,...);
So, most functions probably take such a pointer.
But, XUartPs_IsReceiveData does not want a pointer to a XUartPs_Config struct. It wants a base address. This is:
XUartPs_Main.BaseAddress
So, you want:
XUartPs_IsReceiveData(XUartPs_Main.BaseAddress)

Read Kernel Memory from user mode WITHOUT driver

I'm writing an program which enumerates hooks created by SetWindowsHookEx() Here is the process:
Use GetProcAddress() to obtain gSharedInfo exported in User32.dll(works, verified)
Read User-Mode memory at gSharedInfo + 8, the result should be a pointer of first handle entry. (works, verified)
Read User-Mode memory at [gSharedInfo] + 8, the result should be countof handles to enumerate. (works, verified)
Read data from address obtained in step 2, repeat count times
Check if HANDLEENTRY.bType is 5(which means it's a HHOOK). If so, print informations.
The problem is, although step 1-3 only mess around with user mode memory, step 4 requires the program to read kernel memory. After some research I found that ZwSystemDebugControl can be used to access Kernel Memory from user mode. So I wrote the following function:
BOOL GetKernelMemory(PVOID pKernelAddr, PBYTE pBuffer, ULONG uLength)
{
MEMORY_CHUNKS mc;
ULONG uReaded = 0;
mc.Address = (UINT)pKernelAddr; //Kernel Memory Address - input
mc.pData = (UINT)pBuffer;//User Mode Memory Address - output
mc.Length = (UINT)uLength; //length
ULONG st = -1;
ZWSYSTEMDEBUGCONTROL ZwSystemDebugControl = (ZWSYSTEMDEBUGCONTROL)GetProcAddress(
GetModuleHandleA("ntdll.dll"), "NtSystemDebugControl");
st = ZwSystemDebugControl(SysDbgCopyMemoryChunks_0, &mc, sizeof(MEMORY_CHUNKS), 0, 0, &uReaded);
return st == 0;
}
But the function above didn't work. uReaded is always 0 and st is always 0xC0000002. How do I resolve this error?
my full program:
http://pastebin.com/xzYfGdC5
MSFT did not implement NtSystemDebugControl syscall after windows XP.
The Meltdown vulnerability makes it possible to read Kernel memory from User Mode on most Intel CPUs with a speed of approximately 500kB/s. This works on most unpatched OS'es.

Hot Patching A Function

I'm trying to hot patch an exe in memory, the source is available but I'm doing this for learning purposes. (so please no comments suggesting i modify the original source or use detours or any other libs)
Below are the functions I am having problems with.
vm_t* VM_Create( const char *module, intptr_t (*systemCalls)(intptr_t *), vmInterpret_t interpret )
{
MessageBox(NULL, L"Oh snap! We hooked VM_Create!", L"Success!", MB_OK);
return NULL;
}
void Hook_VM_Create(void)
{
DWORD dwBackup;
VirtualProtect((void*)0x00477C3E, 7, PAGE_EXECUTE_READWRITE, &dwBackup);
//Patch the original VM_Create to jump to our detoured one.
BYTE *jmp = (BYTE*)malloc(5);
uint32_t offset = 0x00477C3E - (uint32_t)&VM_Create; //find the offset of the original function from our own
memset((void*)jmp, 0xE9, 1);
memcpy((void*)(jmp+1), &offset, sizeof(offset));
memcpy((void*)0x00477C3E, jmp, 5);
free(jmp);
}
I have a function VM_Create that I want to be called instead of the original function. I have not yet written a trampoline so it crashes (as expected). However the message box does not popup that I have detoured the original VM create to my own. I believe it is the way I'm overwriting the original instructions.
I can see a few issues.
I assume that 0x00477C3E is the address of the original VM_Create function. You really should not hard code this. Use &VM_Create instead. Of course this will mean that you need to use a different name for your replacement function.
The offset is calculated incorrectly. You have the sign wrong. What's more the offset is applied to the instruction pointer at the end of the instruction and not the beginning. So you need to shift it by 5 (the size of the instruction). The offset should be a signed integer also.
Ideally, if you take into account my first point the code would look like this:
int32_t offset = (int32_t)&New_VM_Create - ((int32_t)&VM_Create+5);
Thanks to Hans Passant for fixing my own silly sign error in the original version!
If you are working on a 64 bit machine you need to do your arithmetic in 64 bits and, once you have calculated the offset, truncate it to a 32 bit offset.
Another nuance is that you should reset the memory to being read-only after having written the new JMP instruction, and call FlushInstructionCache.

Buffer overflow in C

I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.

Utilizing the LDT (Local Descriptor Table)

I am trying to do some experiments using different segments besides the default code and data user and kernel segments. I hope to achieve this through use of the local descriptor table and the modify_ldt system call. Through the system call I have created a new entry in LDT which is a segment descriptor with a base address of a global variable I want to "isolate" and a limit of 4 bytes.
I try to load the data segment register with the segment selector of my custom LDT entry through inline assembly in a C program, but when I try to access the variable I receive a segmentation fault.
My suspicion is that there is an issue with the offset of my global variable, and when the address is calculated, it exceeds the limit of my custom segment therefore causing a seg fault.
Does anyone know of a work around to this situation?
Oh, by the way, this is on an x86 architecture in Linux. This is my first time asking a question like this on a forum, so if there is any other information that could prove to be useful, please let me know.
Thank you in advance.
Edit: I realized that I probably should include the source code :)
struct user_desc* table_entry_ptr = NULL;
/* Allocates memory for a user_desc struct */
table_entry_ptr = (struct user_desc*)malloc(sizeof(struct user_desc));
/* Fills the user_desc struct which represents the segment for mx */
table_entry_ptr->entry_number = 0;
table_entry_ptr->base_addr = ((unsigned long)&mx);
table_entry_ptr->limit = 0x4;
table_entry_ptr->seg_32bit = 0x1;
table_entry_ptr->contents = 0x0;
table_entry_ptr->read_exec_only = 0x0;
table_entry_ptr->limit_in_pages = 0x0;
table_entry_ptr->seg_not_present = 0x0;
table_entry_ptr->useable = 0x1;
/* Writes a user_desc struct to the ldt */
num_bytes = syscall( __NR_modify_ldt,
LDT_WRITE, // 1
table_entry_ptr,
sizeof(struct user_desc)
);
asm("pushl %eax");
asm("movl $0x7, %eax"); /* 0111: 0-Index 1-Using the LDT table 11-RPL of 3 */
asm("movl %eax, %ds");
asm("popl %eax");
mx = 0x407CAFE;
The seg fault occurs at that last instruction.
I can only guess, since I don't have the assembly available to me.
I'm guessing that the line at which you get a segfault is compiled to something like:
mov ds:[offset mx], 0x407cafe
Where offset mx is the offset to mx in the program's data segment (if it's a static variable) or in the stack (if it's an automatic variable). Either way, this offset is calculated at compile time, and that's what will be used regardless of what DS points to.
Now what you've done here is create a new segment whose base is at the address of mx and whose limit is either 0x4 or 0x4fff (depending on the G-bit which you didn't specify).
If the G-bit is 0, then the limit is 0x4, and since it's highly unlikely that mx is located between addresses 0x0 and 0x4 of the original DS, when you access the offset to mx inside the new segment you're crossing the limit.
If the G-bit is 1, then the limit is 0x4fff. Now you'll get a segfault only if the original mx was located above 0x4fff.
Considering that the new segment's base is at mx, you can access mx by doing:
mov ds:[0], 0x407cafe
I don't know how I'd go about writing that in C, though.

Resources