I've been trying to parse/display the information in the Import Address Table (IAT) of a process after it is loaded and running. I understand API calls in programs jump to the relevant point in the IAT, which then jumps to the actual function in the loaded DLL's.
Is it correct that the IAT can be found by reading the PE header and following the OptionalHeader.DataDirectory[1] pointer, to the array of IMAGE_IMPORT_DESCRIPTORs. Then following the FirstThunk pointers. Whereas the OriginalFirstThunk pointers here, will give you the original Import Table (IT)?
I have also tried following the OptionalHeader.DataDirectory[12] pointer in the PE header, but this was even less successful.
I've been testing this by trying to parse this structure for notepad.exe (32bit), using ReadProcessMemory from another process.
Here's the rough C-psuedocode for what I'm doing:
char buf[128];
// get first import descriptor
readMemory(&import, procImgBase + DataDirectory[1].VirtualAddress, sizeof(IMAGE_IMPORT_DESCRIPTOR));
// get dll name
readMemory(buf, import.Name + procImgBase, 127);
printf("libname: %s\n", buf);
// get first function name
DWORD iltAddress = 0;
readMemory(&iltAddress, import.FirstThunk + procImgBase, 4);
readMemory(buf, iltAddress + procImgBase, 127);
printf("fname: %s\n", libName + 2); // <-- the +2 for the 2byte 'hint' of import lookup table entries
If, on the 3rd to last line, i replace it with import.OriginalFirstThunk instead of FirstThunk, it will print everything as expected. I must be missing something conceptually, and so I was wondering if anyone could clarify what this is, for me?
Many thanks!
It looks like you're heading the right direction. Some notes:
The DataDirectory gives you an offset
to an array of
IMAGE_IMPORT_DESCRIPTOR which is
terminated by an entry of all zeros.
There will be one
IMAGE_IMPORT_DESCRIPTOR for each DLL
that is imported
The
IMAGE_IMPORT_DESCRIPTOR has offsets
to 2 arrays of IMAGE_THUNK_DATA, one
that maintains offsets to the names
of the imported functions
(OriginalFirstThunk) and another that
now has the actual addresses of the
functions (FirstThunk)
Since your executable is running, the IAT should contain the actual address of the function rather than an RVA to a name entry.
You could do something like this instead:
DWORD rva_to_name_of_function = 0;
DWORD address_of_function = 0;
// get the RVA of the IMAGE_IMPORT_BY_NAME entry
readMemory(&rva_to_name, import.OriginalFirstThunk + procImgBase, 4);
// copy the name of the import
readMemory(buf, rva_to_name + procImgBase + 2, 127);
// get the actual address that was filled in by the loader
readMemory(&address_of_function, import.FirstThunk + procImgBase, 4);
printf("fname: %s address: %X", buf, address_of_function);
Take a look at this article for some helpful details:
http://msdn.microsoft.com/en-us/magazine/cc301808.aspx
Eric gave a good answer, here are some additional clarifications:
I understand API calls in programs jump to the relevant point in the IAT, which then jumps to the actual function in the loaded DLL's.
The program uses a CALL PTR DS:[IAT-ADDRESS] that reads from an address in the IAT to determine where the program is at runtime.
Whereas the OriginalFirstThunk pointers here, will give you the original Import Table (IT)?
The OriginalFirstThunk pointers point you at the Import Lookup table (ILT). If you open up the binary on disk, the ILT and the IAT are identical; both contain RVA's to function name strings. Once the program has been loaded, the IAT's entries (in memory) are overwritten with the addresses of the imported functions.
In my experience, the best source of information on the import table and all of its attendant data structures is the PE specification itself. If you read patiently through the section on imports, all will be made clear.
http://msdn.microsoft.com/en-us/windows/hardware/gg463125
Related
I am currently studying Linux internals and I am confused about the address pointed by vm_start. My goal is to retrieve a loaded .so/module base address from within the kernel. Here is what I got:
// loop through task' memory maps
for (struct vm_area_struct *i = task->mm->mmap; NULL != i; i = i->vm_next)
{
if (NULL == i->vm_file)
{
continue;
}
// if the module name matches
if (string_ends_with (i->vm_file->f_path.dentry->d_iname, module_name))
{
return i->vm_start; // return mapping start
}
}
return NULL; // the specified module is not loaded in memory
This code returned the address f7888000, however, by checking this in /proc/pid/maps the correct address is 7f0ef7888000 (which I expected to be returned from my function). I find this odd, since show_vma_header_prefix itself is called with vm_start as a parameter, as we can see here: https://github.com/torvalds/linux/blob/master/fs/proc/task_mmu.c#L968.
Am I mistaking something? Why is the address returned from my function 7f0e00000000 bytes away from the expected result? Is there a way I can get this value from kernel space? Reading from /proc/pid/maps is out of question and the target process is not mine, I want to retrieve the address externally. This is not production code, I know reading from proc/maps and then passing as a argument is definitely a better way. However, this is a experimental rootkit for studying purposes.
TL;DR vm_start is 7f0e00000000 bytes off from the expected result
I'm using the symEnumSymbol function from dbghelp library to get details about any malloc symbols in an executable. One of the arguments that I need to pass is a callback functoin with the following signature:
BOOL CALLBACK EnumSymProc(
PSYMBOL_INFO pSymInfo,
ULONG SymbolSize,
PVOID UserContext);
And I want to extract all the data I can from those parameters.
The Windows Dev Center provide this insufficient description about the second:
SymbolSize:
The size of the symbol, in bytes. The size is calculated and is actually a guess. In some cases, this value can be zero.
I've implemented the callback in the following way:
BOOL CALLBACK EnumSymCallback(
PSYMBOL_INFO pSymInfo,
ULONG SymbolSize,
PVOID UserContext)
{
UNREFERENCED_PARAMETER(UserContext);
printf("Hello from symEnumSymbols!\n");
printf("%08X %4u %s\n", (unsigned int)pSymInfo->Address, SymbolSize, pSymInfo->Name);
return TRUE;
}
and I call SymEnumSymbols with those arguments:
if (!SymEnumSymbols(
GetCurrentProcess(), // handler to the process.
0,
"*!malloc", // combination of the last two lines means: Enumerate every 'malloc' symbol in every loaded module - we might change this...
EnumSymCallback,
NULL // argument for the callback.
))
{
printf("SymEnumSymbols failed :-(\n");
DWORD error = GetLastError();
printf("SymEnumSymbols returned error : %d\n", error);
return FALSE;
}
printf("SymEnumSymbols succeeded :-)\n");
and I got this output: [EDIT: I just added enumeration for free ]
Hello from symEnumSymbols!
766300D0 16 malloc
Hello from symEnumSymbols!
0F9BE340 32 malloc
Hello from symEnumSymbols!
7662E0F0 48 free
Hello from symEnumSymbols!
0F9BDFA0 80 free
SymEnumSymbols succeeded :-)
As you can see, in the first time malloc symbol size is 16 and in the second 32. I'm not sure how I got two malloc in the first place since my executable supposed to have only one (I wrote the source) but assuming the other one is coming from the compiler or something - what are those sizes? and why they are different?!
I can guess it specify a 32 bit command or a 16 command, but I realy don't have a clue and this not maiking sense with free results. Thanks for any help!
Taken from the docs.
[in] SymbolSize
The size of the symbol, in bytes. The size is calculated and is
actually a guess. In some cases, this value can be zero.
That descrpition looks confusing. I personally don't use SymbolSize but gather specifically ask for the symbol length when requried.
There are different types of symbols like Function symbols to UDT symbols (describes the layout of a struct or class). The SymbolSize makes sense for UDT symbol but for a function symbol I have no idea what the SymbolSize may mean. The code size of the function itself? The fact that it says "is actually a guess", I would take that it's not that useful.
I try to compile wingraphviz for x64 (it's an old, unmaintained project), and ran into a very strange problem :
There's a call to getDefaultFont() that looks like this :
const char* def = getDefaultFont();
Deffontname = late_nnstring(g->proto->n,N_fontname,def);
(original code did the call inside function call, but I extracted it for understanding)
the getDefaultFont function is very simple, and returns a string litteral based on current charset :
const char * getDefaultFont() {
switch(DOT_CODEPAGE) {
case CP_KOREAN:
return CP_949_DEFAULTFONT;
break;
[...]
default:
return DEFAULT_FONTNAME;
break;
}
}
with DEFAULT_FONTNAME & others defined in a header file :
#define DEFAULT_FONTNAME "Times New Roman"
I changed the return to { const char* r = DEFAULT_FONTNAME; return r; } to see the value while debugging: r is correct at return instruction.
But when the debugger returns to caller function, def points to invalid memory.
I ran the debugger in assembly mode, and see that :
const char* def = getDefaultFont();
000007FEDA1244FE call getDefaultFont (07FEDA1291A0h)
000007FEDA124503 cdqe
000007FEDA124505 mov qword ptr [def],rax
after the call instruction, RAX contains the correct value, a pointer to .data : RAX = 000007FEDA0C9A20
but the next instruction, cqde "Convert dword (eax) to qword (rax)." destroy the 4 higher bytes, and now RAX = FFFFFFFFDA0C9A20. Then the third stores the truncated value on stack.
After that, late_nnstring() tries to de-reference the corrupted pointer and crashes...
Do you know why VS inserts this cqde instruction ?
All theses functions are in .c files under the same project.
I've implemented a workaround, using strdup() to return low-memory addresses, but it's not safe (maybe heap can use memory after 4G?) (and there my be some other cases I did not find while testing that will crash when using the library)
I published the files here : https://gitlab.com/data-public/wingraphviz
especially :
caller at https://gitlab.com/data-public/wingraphviz/blob/97085eeb6e9356c7784965c5a43757d8db3fec41/dependencies/graphviz-1.8.10/dotneato/common/emit.c#L842
getDefaultFont at https://gitlab.com/data-public/wingraphviz/blob/97085eeb6e9356c7784965c5a43757d8db3fec41/dependencies/graphviz-1.8.10/dotneato/common/utils.c#L111
constant defines at https://gitlab.com/data-public/wingraphviz/blob/97085eeb6e9356c7784965c5a43757d8db3fec41/dependencies/graphviz-1.8.10/dotneato/common/const.h#L49
Your links require some account I don’t have.
You likely failed to include the header declaring that function, or messed up with headers order. Here’s more info why C compiler inserts cdqe.
P.S. Great example why you should read, and fix, compiler warnings.
Update: If you have circular dependency problem and can’t just include utils.h, a quick workaround is declare const char * getDefaultFont(); in emit.c before you call that function.
We recieved a homework assignment in which we need to take an ELF file and print its sections' names.
We are supposed to do all that using only the data we receive directly from the ELF header,
meaning we can't use any "high level" procedures - we need to go directly to the data we need.
So, im trying to print the first section's name. I know the names are supposed to be in the string table. This is what I have so far:
I'm getting the start of the ELF file using mmap...
elfhead =(Elf32_Ehdr *) mmap...
I'm getting the section offset using the members in the ELF header
sectionoffset = elfhead->e_shoff
then
section = (Elf32_Shdr*)(elfhead + sectionoffset)
nameoffset = section->sh_name
stringoffset = elfhead->e_shstrndx;
To be clear -
in elfhead i have the elf header
in section i have the section header
in stringoffset i have the index inside the section table where the
string table is supposed to be
in nameoffset i have the index in
the string table where the first section name is suppose to be.
How do I go to the first name and print it, given the code above?
Well first off you'd have to have access to the section's String Table, and since the header is the first thing in the ELF file:
char* stringTable = elfhead + (section + header->stringoffset)->sh_offset;
Once you have that, all you really have to do is print the first one using the nameoffset you already obtained, like so.
char* name = stringTable + nameoffset;
printf("%s\n",name);
FYI, printing the rest of the names would be a simple loop:
for(i=0;i<header->e_shnum;i++){
char* name = stringTable + nameoffset;
printf("%s\n",name);
section++;
}
I'm not sure if the title correctly reflects my question.
I have a library implemented in C for lua provided to me by my employer.
They have it reading a bunch of data out of a modbus device such that:
readFunc(Address, numReads)
will start at Address and read numRead amount of registers. Currently this returns data in the following way:
A, B, C, D = readFunc(1234, 4)
However, we need to do 32+ reads at a time for some of our functions and I really don't want to have reply1, reply2... reply32+ listed in my code every time I do this.
Ideally, I would like to do something like:
array_of_awesome_data = {}
array_of_awesome_data = readFunc(1234, 32)
where array_of_awesome_data[1] would correspond to A in the way we do it now.
In the current C code I was given, each data is returned in a loop:
lua_pushinteger(L, retData);
How would I go about adjusting a C implemented lua library to allow the lua function to return an array?
Note: a loop of multiple reads is too inefficient on our device so we need to do 1 big read. I do not know enough of the details to justify why, but it is what I was told.
In Lua, you can receive a list returned from a function using table.pack, e.g.:
array_of_awesome_data = table.pack(readFunc(1234, 32))
Or in C, if you want to return a table instead of a list of results, you need to first push a table onto the stack, and then push each item on the stack and add it to the table. It would have something like the following:
num_results=32; /* set this dynamically */
lua_createtable(L, num_results, 0);
for (i=0; i<num_results; i++) {
lua_pushinteger(L, retData[i]);
lua_rawseti (L, -2, i+1); /* In lua indices start at 1 */
}