How to use NtCurrentTeb() without Windows header files? - c

Windows stores the TEB in FS (32bit) or GS (64bit) segment register. In a program using NtCurrentPeb() the x86 instruction is mov rax, gs:60h. The 0x60 value is offsetof(TEB, ProcessEnvironmentBlock).
To use this in a program I've to include both Windows.h and Winternl.h header file which has bunch of other #define. As the question said I want to use the function without these header file and by directly accessing the segment register. I've also made a separate header file with the TEB and PEB structure. So how can I do that? I was thinking with __asm keyword and a typedef NtCurrentTeb() or something.

I really do not understand why you answered your own question incompletely. This confuses further readers because you did not provide the appropriate answer to the question itself.
You do not need to use ASM for this, you can use intrinsic functions like so:
#ifdef _M_X64
auto pPEB = (PPEB)__readgsqword(0x60);
#elif _M_IX86
auto pPEB = (PPEB)__readfsdword(0x30);
#else
#error "PPEB Architecture Unsupported"
#endif
But to answer the actual question, here is how to do is via ASM:
x64 ASM (TEB/PEB):
GetTEBAsm64 proc
mov rax, qword ptr gs:[00000030h]
ret
GetTEBAsm64 endp
GetPEBAsm64 proc
mov rax, qword ptr gs:[00000060h]
ret
GetPEBAsm64 endp
x86 - PEB:
__asm
{
mov eax, dword ptr fs : [00000030h]
mov peb, eax
}
x86 - TEB:
__asm
{
mov eax, dword ptr fs : [00000018h]
mov teb, eax
}
I strongly hope that my answer is clear and that someone else in the future can benefit from it.

Declare function prototype and link against ntdll.dll.

To read from gs or fs segment register, I have used this assembly in Visual Studio. Create a C/C++ empty project in Visual Studio with these settings enabled. fs or gs segment register provides NT_TIB structure in 32 bit and 64 bit Windows respectively. TEB is at 0x30 offset in NT_TIB structure. So the assembly in 64 bit will be: mov rax, gs:[30h].
Here is a sample source code to get current directory of an executable file:
ProcParam.asm:
.code
ProcParam PROC
mov rax, gs:[30h] ; TEB from gs in 64 bit only
mov rax, [rax+60h] ; PEB
mov rax, [rax+20h] ; RTL_USER_PROCESS_PARAMETERS
ret
ProcParam ENDP
end
main.c:
#include <stdio.h>
typedef struct _UNICODE_STRING {
unsigned short Length;
unsigned short MaximumLength;
wchar_t* Buffer;
} UNICODE_STRING, *PUNICODE_STRING;
typedef struct _CURDIR {
UNICODE_STRING DosPath;
void* Handle;
} CURDIR, *PCURDIR;
/*Extracted from ntdll.pdb file*/
typedef struct _RTL_USER_PROCESS_PARAMETERS {
unsigned int MaximumLength;
unsigned int Length;
unsigned int Flags;
unsigned int DebugFlags;
void* ConsoleHandle;
unsigned int ConsoleFlags;
void* StandardInput;
void* StandardOutput;
void* StandardError;
CURDIR CurrentDirectory;
/*Many more*/
} RTL_USER_PROCESS_PARAMETERS, *PRTL_USER_PROCESS_PARAMETERS;
PRTL_USER_PROCESS_PARAMETERS ProcParam(void);
int main(void)
{
wprintf(L"%s\n", ProcParam()->CurrentDirectory.DosPath.Buffer);
}

Related

Stack cleanup not working (__stdcall MASM function)

there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)
MASM code:
.MODE FLAT, STDCALL
...
memcpy PROC dest : DWORD, source : DWORD, size : DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
memcpy ENDP
I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy#12", so its indeed finding the proper symbol
this is the C prototype:
extern void __stdcall * _memcpy(void*,void*,unsigned __int32);
Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working
EDIT:
MASM code:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
C code:
extern void __stdcall __MyMemcpy(void*, void*, int);
typedef struct {
void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;
int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}
when I call it like this I get the error:
MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);
when I call it like this I dont:
__MyMemcpy(dest,src,size)
Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:
.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
.CODE
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP
END
A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:
extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.
From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:
This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.
I wrote a small C++ project with this code that has similar behaviour to your C program:
#include <iostream>
extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);
typedef struct {
void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;
int initmemfunc(MemFunc* f) {
f->MemCpy = __MyMemcpy;
return 0;
}
char buf1[] = "Testing";
char buf2[200];
int main()
{
MemFunc mf = { 0 };
initmemfunc(&mf);
mf.MemCpy(buf2, buf1, strlen(buf1));
std::cout << "Hello World!\n" << buf2;
}
When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:
I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.
Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.
You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:
__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
PUSH EDI ; Save registers first
PUSH ESI
MOV EDI, DWORD PTR [ESP + 0CH] ; Arguments are offset by an additional 8 bytes
MOV ESI, DWORD PTR [ESP + 10H]
MOV ECX, DWORD PTR [ESP + 14H]
REP MOVSB
POP ESI ; Restore the caller (non-volatile) registers
POP EDI
RETN 0CH
__MyMemcpy ENDP
If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.
The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:
.MODEL FLAT, STDCALL
.CODE
__MyMemcpy PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
MOV EDI, dest
MOV ESI, source
MOV ECX, size
REP MOVSB ; Use instead of LODSB/STOSB+Loop
RET
__MyMemcpy ENDP
END
The assembler would generate this code for you:
PUBLIC __MyMemcpy#12
__MyMemcpy#12:
push ebp
mov ebp,esp ; Function prologue generate by PROC
push esi ; USES caused assembler to push EDI/ESI on stack
push edi
mov edi,dword ptr [ebp+8]
mov esi,dword ptr [ebp+0Ch]
mov ecx,dword ptr [ebp+10h]
rep movs byte ptr es:[edi],byte ptr [esi]
; MASM generated this from the simple RET instruction to restore registers,
; clean up stack and return back to caller per the STDCALL calling convention
pop edi ; Assembler
pop esi
leave
ret 0Ch
Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.
The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.
You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.
.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE
.CODE
mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
MOV EDI, [ESP+04H]
MOV ESI, [ESP+08H]
MOV ECX, [ESP+0CH]
AGAIN_:
LODSB
STOSB
LOOP AGAIN_
RETN 0CH
mymemcpy ENDP
END

Why does this 16-bit DOS example from a book crash when I call it from C (compiled with visual studio?)

my OS is window 7 64-bit.
here is my code
first.c :
#include <stdio.h>
extern long second(int, int);
void main()
{
int val1, val2;
long result;
scanf("%d %d", &val1, &val2);
result = second(val1, val2);
printf("%ld", result);
}
second.asm :
.model small
.code
public _second
_second proc near
push bp
mov bp,sp
mov ax,[bp+4]
mov bx,[bp+6]
add ax,bx
pop bp
ret
_second endp
end
compiled OK, but "mov ax,[bp+4]" this line has error "0xC0000005: Access violation reading location 0x00000004."
what's wrong?
You're assembling code in 16-bit mode and linking it into a 32-bit program which is executed in 32-bit mode. The machine code that makes up your second function ends up getting interpreted differently than you expected. This this code that is actually executed:
_second:
00407800: 55 push ebp
00407801: 8B EC mov ebp,esp
00407803: 8B 46 04 mov eax,dword ptr [esi+4]
00407806: 8B 5E 06 mov ebx,dword ptr [esi+6]
00407809: 03 C3 add eax,ebx
0040780B: 5D pop ebp
0040780C: C3 ret
Instead of using 16-bit registers the code uses 32-bit registers. Instead using the BP register as a base when addressing the arguments on the stack, it uses ESI as a base. Since ESI is not initialized to anything in the function, it holds whatever random value it happened to have before the call (eg. 0). Wherever that is isn't valid readable address so accessing it causes a crash.
Your problem is that you've taken assembly code meant to be used with a 16-bit compiler for a 16-bit operating operating system (eg. MS-DOS) and using it with a 32-bit compiler for Windows. You can't blindly cut & paste code examples like that. Here's 32-bit version of your assembly code:
.MODEL FLAT
.CODE
PUBLIC _second
_second PROC
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov edx, [ebp+12]
add eax, edx
pop ebp
ret
_second ENDP
END
The .MODEL FLAT directive tells the assembler you're generating 32-bit code. I've changed the code to use 32-bit registers, and adjusted the frame pointer (EBP) relative offsets to reflect the fact that stack slots in 32-bit mode are 4 bytes long. I also changed the code to use EDX instead of EBX because in 32-bit C calling convention the EBX register needs to preserved by the function, while EDX (like BX in the 16-bit C calling convention) doesn't.
SP and BP are probably 0 in this specific case. Note however that SP and BP are the lowest 16-bit quarters of RSP and RBP respectively, so the stack pointer isn't really 0.
Another solution to pass parameters from .c to .asm is to use the "fastcall" convention, which let you pass two parameters in registers CX and DX (actually it's ECX and EDX, but you are using 16 bit registers in your code). Next is a short example tested in VS 2013, it sends two ints (2, 5) to the asm function and the function returns the addition of those values (7) :
first.cpp
#include "stdafx.h"
extern "C" int __fastcall second(int,int); // ◄■■ KEYWORDS "C" AND __FASTCALL.
int _tmain(int argc, _TCHAR* argv[])
{
short int result = second(2,5); // ◄■■ "RESULT" = 7.
return 0;
}
second.asm
.model small
.code
public #second#8 ◄■■ NOTICE THE # AND THE 8.
#second#8 proc near ◄■■ NOTICE THE # AND THE 8.
mov ax,cx ◄■■ AX = 2.
add ax,dx ◄■■ AX + 5 (RETURN VALUE).
ret
#second#8 endp ◄■■ NOTICE THE # AND THE 8.
end

Pointer to member using _asm

I've been trying to go from a line of code in c to assembly, but I just can't figure out what would be the correct translation of the pointer-to-member function using asm.
Here is a fragment of the code:
struct file{
int size;
}FILE;
void function(FILE *result){
result -> size;
}
Assuming a x86-64 compiler,
_function:
; rdi = pointer to struct file
; rax = size element
mov eax, [rdi]

Difference between extern and volatile

This question regards the difference between the volatile and extern variable and also the compiler optimization.
One extern variable defined in main file and used in one more source file, like this:
ExternTest.cpp:
short ExtGlobal;
void Fun();
int _tmain(int argc, _TCHAR* argv[])
{
ExtGlobal=1000;
while (ExtGlobal < 2000)
{
Fun();
}
return 0;
}
Source1.cpp:
extern short ExtGlobal;
void Fun()
{
ExtGlobal++;
}
The assembly generated for this in the vs2012 as below:
ExternTest.cpp assembly for accessing the external variable
ExtGlobal=1000;
013913EE mov eax,3E8h
013913F3 mov word ptr ds:[01398130h],ax
while (ExtGlobal < 2000)
013913F9 movsx eax,word ptr ds:[1398130h]
01391400 cmp eax,7D0h
01391405 jge wmain+3Eh (0139140Eh)
Source.cpp assembly for modifying the extern variable
ExtGlobal++;
0139145E mov ax,word ptr ds:[01398130h]
01391464 add ax,1
01391468 mov word ptr ds:[01398130h],ax
From the above assembly, every access to the variable "ExtGlobal" in the while loop reads the value from the corresponding address. If i add volatile to the external variable the same assembly code was generated. Volatile usage in two different threads and external variable usage in two different functions are same.
Asking about extern and volatile is like asking about peanuts and gorillas. They're completely unrelated.
extern is used simply to tell the compiler, "Hey, don't expect to find the definition of this symbol in this C file. Let the linker fix it up at the end."
volatile essentially tells the compiler, "Never trust the value of this variable. Even if you just stored a value from a register to that memory location, don't re-use the value in the register - make sure to re-read it from memory."
If you want to see that volatile causes different code to be generated, write a series of reads/writes from the variable.
For example, compiling this code in cygwin, with gcc -O1 -c,
int i;
void foo() {
i = 4;
i += 2;
i -= 1;
}
generates the following assembly:
_foo proc near
mov dword ptr ds:_i, 5
retn
_foo endp
Note that the compiler knew what the result would be, so it just went ahead and optimized it.
Now, adding volatile to int i generates the following:
public _foo
_foo proc near
mov dword ptr ds:_i, 4
mov eax, dword ptr ds:_i
add eax, 2
mov dword ptr ds:_i, eax
mov eax, dword ptr ds:_i
sub eax, 1
mov dword ptr ds:_i, eax
retn
_foo endp
The compiler never trusts the value of i, and always re-loads it from memory.

Interpretation of C Code in IDA Pro Needed

I'm using IDA Pro to disassemble the following C code: However looking at the disassembly below it seems to me incomplete. The data is never initialized (as per C code) even though it does appear to be loaded into the stack however the procedure (nullsub_1) that is located at 00401040 makes no use of the data ? Am I making a correct assessment or am I missing something ??? I have used Visual C++ 6/2005 to compile the C code.
#include <stdio.h>
#include <windows.h>
struct a
{
char s[10];
BYTE b;
int i;
};
a al;
void init(a);
void main()
{
init(al);
};
void init(a c)
{
for(int j = 0; j < 10; j++) c.s[j] = 'A';
c.b = 10;
c.i = 10000;
};
.text:00401000 ; int __cdecl main(int argc,const char **argv,const char *envp)
.text:00401000 _main proc near ; CODE XREF: start+AFp
.text:00401000
.text:00401000 argc = dword ptr 4
.text:00401000 argv = dword ptr 8
.text:00401000 envp = dword ptr 0Ch
.text:00401000
.text:00401000 mov ecx, dword_4084C0
.text:00401006 mov edx, dword_4084C4
.text:0040100C sub esp, 10h
.text:0040100F mov eax, esp
.text:00401011 mov [eax], ecx
.text:00401013 mov ecx, dword_4084C8
.text:00401019 mov [eax+4], edx
.text:0040101C mov edx, dword_4084CC
.text:00401022 mov [eax+8], ecx
.text:00401025 mov [eax+0Ch], edx
.text:00401028 call nullsub_1
.text:0040102D add esp, 10h
.text:00401030 retn
.text:00401030 _main endp
.text:00401030
.text:00401030 ;
.text:00401031 align 10h
.text:00401040
.text:00401040
.text:00401040
.text:00401040 nullsub_1 proc near ; CODE XREF: _main+28p
.text:00401040 retn
.text:00401040 nullsub_1 endp
Your source code has no side effects other than just writing to memory. The compiler eliminates those writes as useless.
You may have better luck if you compile it in Debug mode (instead of Release) or turn off some compiler optimizations.
Alternatively, accesses to variables defined as volatile will be preserved, so you can add volatile in your code.

Resources