How to parse PE exported functions in C - c

I want to list all exported functions from an PE (PortableExecutable).
Here is some code:
PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)PE_Header;
PIMAGE_NT_HEADERS32 ntHeader = (PIMAGE_NT_HEADERS32)(PE_Header + dos_header->e_lfanew);
// how to go on?
the following line gives me an AccessViolation because the VirtualAddress Member is to big :
printf("Export Table %s\n", PE_Header + ntHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
I dont know how to go on and list all exported functions.
Can you provide some sample working code?

Related

Create shared parameter file for C and Python

I need to create a parameter file that can be managed across a Python 3.7 and a C code base. This file needs to be modifiable either by the C or the Python program with the changes being taking effect on the other software (an update function will handle reading the updated file). It's best if the file is not human readable, as it contains information that is better left obfuscated.
**Is there a recommended method to do so? **
I could create separate python and C files, but the set of parameters will change over time (for code maintenance), and the values would be changed by these programs. The list would also be very long. It would be a hassle to maintain two different files and update them over time. Also, the file may need to be exchanged between users, such that a version modified by the software ran by user1 needs to be readable by the software run by user2. The idea is that other parts of both codes could access parts of the parameter list without knowing the full contents of the list.
To clarify the example, I could have a parameter.h file containing:
struct {
double par1 =1.1;
int par 2 =2;
} par_list
And I could have a parameter.py with:
class par_list:
def(__self__):
self.par1 = double(1.1)
self.par2 = int(2)
Then, by doing a import in Python or a include in C, I could initialize the parameter list. But in this case the parameters are being read on different files.
I'm considering using some kind of binary file to keep the values, and create a script that writes both the Python and C code that reads and updates the values. I'm concerned because the binary file would need to be interchangeable between ARM architecture running Linux, and x86 architecture running Windows.
Here is an example working with numpy:
C code:
#include <stdio.h>
#include <stdint.h>
struct Struct_format{
uint8_t the_unsigned_int8;
int32_t the_signed_int32[2];
double the_double;
};
typedef struct Struct_format upperStruct;
//Use separate file to define default value:
void printStruct(upperStruct test_struct){
printf("test_struct.the_unsigned_int8 = %d\n", test_struct.the_unsigned_int8);
printf("test_struct.the_signed_int32[0] = %d\n", test_struct.the_signed_int32[0]);
printf("test_struct.the_signed_int32[1] = %d\n", test_struct.the_signed_int32[1]);
printf("test_struct.the_double = %f\n", test_struct.the_double);
}
void main(){
//Define a "default" value:
upperStruct fromC2Python = {4U,{-3,-1},2.1};
printf("Printing fromC2Python\n");
printStruct(fromC2Python);
//Save this default in a file:
FILE * fid = fopen("fromC2Python.bin","w");
fwrite((void *)&fromC2Python, sizeof(fromC2Python) ,1, fid);
fclose(fid);
//Now load the file created by Python:
upperStruct fromPython2C;
FILE * fid_py = fopen("fromPython2C.bin","r");
fread(&fromPython2C, sizeof(fromPython2C) ,1, fid_py);
fclose(fid_py);
printf("Printing fromPython2C\n");
printStruct(fromPython2C);
}
Python code:
import numpy
datatype = numpy.dtype([('potato',
[('time', numpy.uint8),
('sec', numpy.int32, 2)]),
('temp', numpy.float64)],
align=True)
fromPython2C = numpy.array([((5, (-6, -7)), 61.55)], dtype=datatype)
print(fromPython2C)
fromPython2C.tofile("fromPython2C.bin", sep="")
fromC2Python = numpy.fromfile("fromC2Python.bin", dtype=datatype, count=-1, sep="")
print(fromC2Python)
print(fromC2Python['potato'])
print(fromC2Python['potato']['time'])
print(fromC2Python['temp'])
The ideia is that numpy allows reading and writing to structured binary files. Hence, it suffices to create the dtype specification with a text parser.

Using array in .h file

I am trying to learn to program in C (not C++!). I've read about external variables, which should (according to the writer) give a nicer code. In order to use the external variables, I must #define them in the .h file, before I can use them in main.c file, using the extern command in front of the variable. I am trying to create an array in the .h file like this:
#define timeVals[4][2];
timeVals[0][0] = 7;
timeVals[0][1] = 45;
timeVals[1][0] = 8;
timeVals[1][1] = 15;
timeVals[2][0] = 9;
timeVals[2][1] = 30;
timeVals[3][0] = 10;
timeVals[3][1] = 25;
(it's a clock I'm trying to make, simple program in console). The first column indicates hours and the second indicates minutes. In my main I have written
extern int timeVals[][];
but I get an error telling me that " expected identifier or '(' before '[' token|" and I can't see what the issue is... any ideas or advices?
I am using the .h file to learn how to use external variables, so I can't move the values back into main.c
First, this:
#define timeVals[4][2];
Is a confusion. You mean this:
int timeVals[4][2];
Put that in your .h file, then in your .c file, something like this:
int timeVals[4][2] = {
{ 1, 2 }, // ...
};
That's how you initialize the entire array (any unspecified elements will be zero).

Using the string table and printing sections names

We recieved a homework assignment in which we need to take an ELF file and print its sections' names.
We are supposed to do all that using only the data we receive directly from the ELF header,
meaning we can't use any "high level" procedures - we need to go directly to the data we need.
So, im trying to print the first section's name. I know the names are supposed to be in the string table. This is what I have so far:
I'm getting the start of the ELF file using mmap...
elfhead =(Elf32_Ehdr *) mmap...
I'm getting the section offset using the members in the ELF header
sectionoffset = elfhead->e_shoff
then
section = (Elf32_Shdr*)(elfhead + sectionoffset)
nameoffset = section->sh_name
stringoffset = elfhead->e_shstrndx;
To be clear -
in elfhead i have the elf header
in section i have the section header
in stringoffset i have the index inside the section table where the
string table is supposed to be
in nameoffset i have the index in
the string table where the first section name is suppose to be.
How do I go to the first name and print it, given the code above?
Well first off you'd have to have access to the section's String Table, and since the header is the first thing in the ELF file:
char* stringTable = elfhead + (section + header->stringoffset)->sh_offset;
Once you have that, all you really have to do is print the first one using the nameoffset you already obtained, like so.
char* name = stringTable + nameoffset;
printf("%s\n",name);
FYI, printing the rest of the names would be a simple loop:
for(i=0;i<header->e_shnum;i++){
char* name = stringTable + nameoffset;
printf("%s\n",name);
section++;
}

Import Table (IT) vs Import Address Table (IAT)

I've been trying to parse/display the information in the Import Address Table (IAT) of a process after it is loaded and running. I understand API calls in programs jump to the relevant point in the IAT, which then jumps to the actual function in the loaded DLL's.
Is it correct that the IAT can be found by reading the PE header and following the OptionalHeader.DataDirectory[1] pointer, to the array of IMAGE_IMPORT_DESCRIPTORs. Then following the FirstThunk pointers. Whereas the OriginalFirstThunk pointers here, will give you the original Import Table (IT)?
I have also tried following the OptionalHeader.DataDirectory[12] pointer in the PE header, but this was even less successful.
I've been testing this by trying to parse this structure for notepad.exe (32bit), using ReadProcessMemory from another process.
Here's the rough C-psuedocode for what I'm doing:
char buf[128];
// get first import descriptor
readMemory(&import, procImgBase + DataDirectory[1].VirtualAddress, sizeof(IMAGE_IMPORT_DESCRIPTOR));
// get dll name
readMemory(buf, import.Name + procImgBase, 127);
printf("libname: %s\n", buf);
// get first function name
DWORD iltAddress = 0;
readMemory(&iltAddress, import.FirstThunk + procImgBase, 4);
readMemory(buf, iltAddress + procImgBase, 127);
printf("fname: %s\n", libName + 2); // <-- the +2 for the 2byte 'hint' of import lookup table entries
If, on the 3rd to last line, i replace it with import.OriginalFirstThunk instead of FirstThunk, it will print everything as expected. I must be missing something conceptually, and so I was wondering if anyone could clarify what this is, for me?
Many thanks!
It looks like you're heading the right direction. Some notes:
The DataDirectory gives you an offset
to an array of
IMAGE_IMPORT_DESCRIPTOR which is
terminated by an entry of all zeros.
There will be one
IMAGE_IMPORT_DESCRIPTOR for each DLL
that is imported
The
IMAGE_IMPORT_DESCRIPTOR has offsets
to 2 arrays of IMAGE_THUNK_DATA, one
that maintains offsets to the names
of the imported functions
(OriginalFirstThunk) and another that
now has the actual addresses of the
functions (FirstThunk)
Since your executable is running, the IAT should contain the actual address of the function rather than an RVA to a name entry.
You could do something like this instead:
DWORD rva_to_name_of_function = 0;
DWORD address_of_function = 0;
// get the RVA of the IMAGE_IMPORT_BY_NAME entry
readMemory(&rva_to_name, import.OriginalFirstThunk + procImgBase, 4);
// copy the name of the import
readMemory(buf, rva_to_name + procImgBase + 2, 127);
// get the actual address that was filled in by the loader
readMemory(&address_of_function, import.FirstThunk + procImgBase, 4);
printf("fname: %s address: %X", buf, address_of_function);
Take a look at this article for some helpful details:
http://msdn.microsoft.com/en-us/magazine/cc301808.aspx
Eric gave a good answer, here are some additional clarifications:
I understand API calls in programs jump to the relevant point in the IAT, which then jumps to the actual function in the loaded DLL's.
The program uses a CALL PTR DS:[IAT-ADDRESS] that reads from an address in the IAT to determine where the program is at runtime.
Whereas the OriginalFirstThunk pointers here, will give you the original Import Table (IT)?
The OriginalFirstThunk pointers point you at the Import Lookup table (ILT). If you open up the binary on disk, the ILT and the IAT are identical; both contain RVA's to function name strings. Once the program has been loaded, the IAT's entries (in memory) are overwritten with the addresses of the imported functions.
In my experience, the best source of information on the import table and all of its attendant data structures is the PE specification itself. If you read patiently through the section on imports, all will be made clear.
http://msdn.microsoft.com/en-us/windows/hardware/gg463125

get function address from name [.debug_info ??]

I was trying to write a small debug utility and for this I need to get the function/global variable address given its name. This is built-in debug utility, which means that the debug utility will run from within the code to be debugged or in plain words I cannot parse the executable file.
Now is there a well-known way to do that ? The plan I have is to make the .debug_* sections to to be loaded into to memory [which I plan to do by a cheap trick like this in ld script]
.data {
*(.data)
__sym_start = .;
(debug_);
__sym_end = .;
}
Now I have to parse the section to get the information I need, but I am not sure this is doable or is there issues with this - this is all just theory. But it also seems like too much of work :-) is there a simple way. Or if someone can tell upfront why my scheme will not work, it ill also be helpful.
Thanks in Advance,
Alex.
If you are running under a system with dlopen(3) and dlsym(3) (like Linux) you should be able to:
char thing_string[] = "thing_you_want_to_look_up";
void * handle = dlopen(NULL, RTLD_LAZY | RTLD_NOLOAD);
// you could do RTLD_NOW as well. shouldn't matter
if (!handle) {
fprintf(stderr, "Dynamic linking on main module : %s\n", dlerror() );
exit(1);
}
void * addr = dlsym(handle, thing_string);
fprintf(stderr, "%s is at %p\n", thing_string, addr);
I don't know the best way to do this for other systems, and this probably won't work for static variables and functions. C++ symbol names will be mangled, if you are interested in working with them.
To expand this to work for shared libraries you could probably get the names of the currently loaded libraries from /proc/self/maps and then pass the library file names into dlopen, though this could fail if the library has been renamed or deleted.
There are probably several other much better ways to go about this.
edit without using dlopen
/* name_addr.h */
struct name_addr {
const char * sym_name;
const void * sym_addr;
};
typedef struct name_addr name_addr_t;
void * sym_lookup(cost char * name);
extern const name_addr_t name_addr_table;
extern const unsigned name_addr_table_size;
/* name_addr_table.c */
#include "name_addr.h"
#define PREMEMBER( X ) extern const void * X
#define REMEMBER( X ) { .sym_name = #X , .sym_addr = (void *) X }
PREMEMBER(strcmp);
PREMEMBER(printf);
PREMEMBER(main);
PREMEMBER(memcmp);
PREMEMBER(bsearch);
PREMEMBER(sym_lookup);
/* ... */
const name_addr_t name_addr_table[] =
{
/* You could do a #include here that included the list, which would allow you
* to have an empty list by default without regenerating the entire file, as
* long as your compiler only warns about missing include targets.
*/
REMEMBER(strcmp),
REMEMBER(printf),
REMEMBER(main),
REMEMBER(memcmp),
REMEMBER(bsearch),
REMEMBER(sym_lookup);
/* ... */
};
const unsigned name_addr_table_size = sizeof(name_addr_table)/sizeof(name_addr_t);
/* name_addr_code.c */
#include "name_addr.h"
#include <string.h>
void * sym_lookup(cost char * name) {
unsigned to_go = name_addr_table_size;
const name_addr_t *na = name_addr_table;
while(to_to) {
if ( !strcmp(name, na->sym_name) ) {
return na->sym_addr;
}
na++;
to_do--;
}
/* set errno here if you are using errno */
return NULL; /* Or some other illegal value */
}
If you do it this way the linker will take care of filling in the addresses for you after everything has been laid out. If you include header files for all of the symbols that you are listing in your table then you will not get warnings when you compile the table file, but it will be much easier just to have them all be extern void * and let the compiler warn you about all of them (which it probably will, but not necessarily).
You will also probably want to sort your symbols by name such that you can use a binary search of the list rather than iterate through it.
You should note that if you have members in the table which are not otherwise referenced by the program (like if you had an entry for sqrt in the table, but didn't call it) the linker will then want (need) to link those functions into your image. This can make it blow up.
Also, if you were taking advantage of global optimizations having this table will likely make those less effective since the compiler will think that all of the functions listed could be accessed via pointer from this list and that it cannot see all of the call points.
Putting static functions in this list is not straight forward. You could do this by changing the table to dynamic and doing it at run time from a function in each module, or possibly by generating a new section in your object file that the table lives in. If you are using gcc:
#define SECTION_REMEMBER(X) \
static const name_addr_t _name_addr##X = \
{.sym_name= #X , .sym_addr = (void *) X } \
__attribute__(section("sym_lookup_table" ) )
And tack a list of these onto the end of each .c file with all of the symbols that you want to remember from that file. This will require linker work so that the linker will know what to do with these members, but then you can iterate over the list by looking at the begin and end of the section that it resides in (I don't know exactly how to do this, but I know it can be done and isn't TOO difficult). This will make having a sorted list more difficult, though. Also, I'm not entirely certain initializing the .sym_name to a string literal's address would not result in cramming the string into this section, but I don't think it would. If it did then this would break things.
You can still use objdump to get a list of the symbols that the object file (probably elf) contains, and then filter this for the symbols you are interested in, and then regenerate the table file the table's members listed.

Resources