My question is about whether or not I should use the Windows API if I'm trying to get the most speed out of my program, where I could instead use a Standard Library function.
I assume the answer isn't consistent among every call; Specifically, I'm curious about stat() vs dwFileAttributes, if I wanted to figure out if a file was a directory or not for example (assuming file_name is a string containing the full path to the file):
WIN32_FIND_DATA fileData;
HANDLE hSearch;
hSearch = FindFirstFile(TEXT(file_name), &fileData);
int isDir = fileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY
vs.
struct stat info;
stat(file_name, &info);
int isDir = S_ISDIR(info.st_mode);
If anyone knows, or can elaborate on what the speed difference between these libraries generally is (if any) I'd appreciate it.
Not an answer regarding speed, but #The Corn Inspector, the MSVC CRT code is open sourced. If you look at an older version ( before .net and common UCRT), and look at the stat() function, it is INDEED a wrapper around the same OS call.
int __cdecl _tstat (
REG1 const _TSCHAR *name,
REG2 struct _stat *buf
)
{ // stuff omitted for clarity
_TSCHAR * path;
_TSCHAR pathbuf[ _MAX_PATH ];
int drive; /* A: = 1, B: = 2, etc. */
HANDLE findhandle;
WIN32_FIND_DATA findbuf;
/* Call Find Match File */
findhandle = FindFirstFile((_TSCHAR *)name, &findbuf);
Of course there is addtional code for mapping structures, etc. Looks like it also does some time conversion:
SYSTEMTIME SystemTime;
FILETIME LocalFTime;
if ( !FileTimeToLocalFileTime( &findbuf.ftLastWriteTime,
&LocalFTime ) ||
!FileTimeToSystemTime( &LocalFTime, &SystemTime ) )
{
so theoretically, it could be slower, but probably so insignificant, as to make no practical difference in the context of a complete, complex program. If you are calling stat() a million times, and worry about milliseconds, who knows. Profile it.
Related
I'm working on reporting some information gleaned from native system APIs. (I know this is bad.... but I'm getting information that I can't get otherwise, and I have little issue with having to update my app if/when that time comes around.)
The native API returns native pathnames, as seen by ob, i.e. \SystemRoot\System32\Ntoskrnl.exe, or \??\C:\Program Files\VMWare Workstation\vstor-ws60.sys.
I can replace common prefixes, i.e.
std::wstring NtPathToWin32Path( std::wstring ntPath )
{
if (boost::starts_with(ntPath, L"\\\\?\\"))
{
ntPath.erase(ntPath.begin(), ntPath.begin() + 4);
return ntPath;
}
if (boost::starts_with(ntPath, L"\\??\\"))
{
ntPath.erase(ntPath.begin(), ntPath.begin() + 4);
}
if (boost::starts_with(ntPath, L"\\"))
{
ntPath.erase(ntPath.begin(), ntPath.begin() + 1);
}
if (boost::istarts_with(ntPath, L"globalroot\\"))
{
ntPath.erase(ntPath.begin(), ntPath.begin() + 11);
}
if (boost::istarts_with(ntPath, L"systemroot"))
{
ntPath.replace(ntPath.begin(), ntPath.begin() + 10, GetWindowsPath());
}
if (boost::istarts_with(ntPath, L"windows"))
{
ntPath.replace(ntPath.begin(), ntPath.begin() + 7, GetWindowsPath());
}
return ntPath;
}
TEST(Win32Path, NtPathDoubleQuestions)
{
ASSERT_EQ(L"C:\\Example", NtPathToWin32Path(L"\\??\\C:\\Example"));
}
TEST(Win32Path, NtPathUncBegin)
{
ASSERT_EQ(L"C:\\Example", NtPathToWin32Path(L"\\\\?\\C:\\Example"));
}
TEST(Win32Path, NtPathWindowsStart)
{
ASSERT_EQ(GetCombinedPath(GetWindowsPath(), L"Hello\\World"), NtPathToWin32Path(L"\\Windows\\Hello\\World"));
}
TEST(Win32Path, NtPathSystemrootStart)
{
ASSERT_EQ(GetCombinedPath(GetWindowsPath(), L"Hello\\World"), NtPathToWin32Path(L"\\SystemRoot\\Hello\\World"));
}
TEST(Win32Path, NtPathGlobalRootSystemRoot)
{
ASSERT_EQ(GetCombinedPath(GetWindowsPath(), L"Hello\\World"), NtPathToWin32Path(L"\\globalroot\\SystemRoot\\Hello\\World"));
}
but I'd be strongly surprised if there's not some API, native or otherwise, which will convert these into Win32 path names. Does such an API exist?
We do this in production code. As far as I know there is no API (public or private) that handles this. We just do some string comparisons with a few prefixes and it works for us.
Apparently there is a function named RtlNtPathNameToDosPathName() in ntdll.dll (introduced with XP?), but I have no idea what it does; I would guess it has more to do with stuff like \Device\Harddisk0, though.
I'm not sure there is really a need for such a function, though. Win32 passes paths (in the sense of CreateFile, etc) to NT; NT doesn't pass paths to Win32. So ntdll.dll doesn't really have a need to go from NT paths to Win32 paths. In the rare case where some NT query function returns a full path, any conversion function could be internal to the Win32 dll (e.g. not exported). I don't even know if they bother, as stuff like GetModuleFileName() will just return whatever path was used to load the image. I guess this is just a leaky abstraction.
Here's something you could try. First use NtCreateFile to open the file, volume etc. for reading. Then use the returned HANDLE to get the full path as described here.
This is a bit late, but I will still post my answer since even today this is a very good question!
I will share one of my functions tested and used for converting NT to DOS path. In my case, I also had to convert from ANSI to UNICODE so this is a small bonus for you to see and understand how this can be done.
All this code can be used in User Mode, so we need to first prepare some things.
Definitions & Structures:
typedef NTSTATUS(WINAPI* pRtlAnsiStringToUnicodeString)(PUNICODE_STRING, PANSI_STRING, BOOL);
typedef struct _RTL_BUFFER {
PUCHAR Buffer;
PUCHAR StaticBuffer;
SIZE_T Size;
SIZE_T StaticSize;
SIZE_T ReservedForAllocatedSize; // for future doubling
PVOID ReservedForIMalloc; // for future pluggable growth
} RTL_BUFFER, * PRTL_BUFFER;
typedef struct _RTL_UNICODE_STRING_BUFFER {
UNICODE_STRING String;
RTL_BUFFER ByteBuffer;
UCHAR MinimumStaticBufferForTerminalNul[sizeof(WCHAR)];
} RTL_UNICODE_STRING_BUFFER, * PRTL_UNICODE_STRING_BUFFER;
#define RTL_NT_PATH_NAME_TO_DOS_PATH_NAME_AMBIGUOUS (0x00000001)
#define RTL_NT_PATH_NAME_TO_DOS_PATH_NAME_UNC (0x00000002)
#define RTL_NT_PATH_NAME_TO_DOS_PATH_NAME_DRIVE (0x00000003)
#define RTL_NT_PATH_NAME_TO_DOS_PATH_NAME_ALREADY_DOS (0x00000004)
typedef NTSTATUS(WINAPI* pRtlNtPathNameToDosPathName)(__in ULONG Flags, __inout PRTL_UNICODE_STRING_BUFFER Path, __out_opt PULONG Disposition, __inout_opt PWSTR* FilePart);
#define RTL_DUPLICATE_UNICODE_STRING_NULL_TERMINATE (0x00000001)
#define RTL_DUPLICATE_UNICODE_STRING_ALLOCATE_NULL_STRING (0x00000002)
#define RTL_DUPSTR_ADD_NULL RTL_DUPLICATE_UNICODE_STRING_NULL_TERMINATE
#define RTL_DUPSTR_ALLOC_NULL RTL_DUPLICATE_UNICODE_STRING_ALLOCATE_NULL_STRING
typedef NTSTATUS(WINAPI* pRtlDuplicateUnicodeString)(_In_ ULONG Flags, _In_ PUNICODE_STRING StringIn, _Out_ PUNICODE_STRING StringOut);
Importing functions:
pRtlAnsiStringToUnicodeString MyRtlAnsiStringToUnicodeString;
pRtlNtPathNameToDosPathName MyRtlNtPathNameToDosPathName;
pRtlDuplicateUnicodeString MyRtlDuplicateUnicodeString;
MyRtlAnsiStringToUnicodeString = (pRtlAnsiStringToUnicodeString)GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlAnsiStringToUnicodeString");
MyRtlNtPathNameToDosPathName = (pRtlNtPathNameToDosPathName)GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlNtPathNameToDosPathName");
MyRtlDuplicateUnicodeString = (pRtlDuplicateUnicodeString)GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlDuplicateUnicodeString");
Helper function:
NTSTATUS NtPathNameToDosPathName(PUNICODE_STRING DosPath, PUNICODE_STRING NtPath)
{
NTSTATUS Status;
ULONG_PTR BufferSize;
PWSTR Buffer;
RTL_UNICODE_STRING_BUFFER UnicodeBuffer;
BufferSize = NtPath->MaximumLength + MAX_PATH * sizeof(WCHAR);
Buffer = (PWSTR)_alloca(BufferSize);
ZeroMemory(&UnicodeBuffer, sizeof(UnicodeBuffer));
UnicodeBuffer.String = *NtPath;
UnicodeBuffer.String.Buffer = Buffer;
UnicodeBuffer.String.MaximumLength = (USHORT)BufferSize;
UnicodeBuffer.ByteBuffer.Buffer = (PUCHAR)Buffer;
UnicodeBuffer.ByteBuffer.Size = BufferSize;
CopyMemory(Buffer, NtPath->Buffer, NtPath->Length);
MyRtlNtPathNameToDosPathName(0, &UnicodeBuffer, NULL, NULL);
return MyRtlDuplicateUnicodeString(RTL_DUPSTR_ADD_NULL, &UnicodeBuffer.String, DosPath);
}
Function usage:
UNICODE_STRING us;
UNICODE_STRING DosPath;
ANSI_STRING as;
as.Buffer = (char*)malloc(strlen(NT_PATH_FILE_OR_DIR) + 1);
strcpy(as.Buffer, NT_PATH_FILE_OR_DIR);
as.Length = as.MaximumLength = us.MaximumLength = us.Length = strlen(NT_PATH_FILE_OR_DIR);
MyRtlAnsiStringToUnicodeString(&us, &as, TRUE);
NtPathNameToDosPathName(&DosPath, &us);
As mentioned, in my case I needed to convert from ANSI to UNICODE and this might not apply for your case, thus you can remove it.
Same as above can be used to create custom functions and convert paths as needed.
Check this out for getting the canonical pathname in Win32. It may be helpful for you:
http://pdh11.blogspot.com/2009/05/pathcanonicalize-versus-what-it-says-on.html
See my answer to this question.
You'd need to first get a handle to the file at that path, and then get the Win32 path for the handle.
I wrote a function that converts different types of NT device names (filenames, COM ports, network paths, etc.) into a DOS path.
There are two functions. One converts a handle into an NT path and the other one converts this NT path into a DOS path.
Have a look here:
How to get name associated with open HANDLE
// "\Device\HarddiskVolume3" (Harddisk Drive)
// "\Device\HarddiskVolume3\Temp" (Harddisk Directory)
// "\Device\HarddiskVolume3\Temp\transparent.jpeg" (Harddisk File)
// "\Device\Harddisk1\DP(1)0-0+6\foto.jpg" (USB stick)
// "\Device\TrueCryptVolumeP\Data\Passwords.txt" (Truecrypt Volume)
// "\Device\Floppy0\Autoexec.bat" (Floppy disk)
// "\Device\CdRom1\VIDEO_TS\VTS_01_0.VOB" (DVD drive)
// "\Device\Serial1" (real COM port)
// "\Device\USBSER000" (virtual COM port)
// "\Device\Mup\ComputerName\C$\Boot.ini" (network drive share, Windows 7)
// "\Device\LanmanRedirector\ComputerName\C$\Boot.ini" (network drive share, Windwos XP)
// "\Device\LanmanRedirector\ComputerName\Shares\Dance.m3u" (network folder share, Windwos XP)
// "\Device\Afd" (internet socket)
// "\Device\Console000F" (unique name for any Console handle)
// "\Device\NamedPipe\Pipename" (named pipe)
// "\BaseNamedObjects\Objectname" (named mutex, named event, named semaphore)
// "\REGISTRY\MACHINE\SOFTWARE\Classes\.txt" (HKEY_CLASSES_ROOT\.txt)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
As they say, your learn coding techniques from others' code. I've been trying to understand couple of free stacks and they all have one thing in common: Structure of function pointers. I've following of questions related to this architecture.
Is there any specific reason behind such an architecture?
Does function call via function pointer help in any optimization?
Example:
void do_Command1(void)
{
// Do something
}
void do_Command2(void)
{
// Do something
}
Option 1: Direct execution of above functions
void do_Func(void)
{
do_Command1();
do_Command2();
}
Option 2: Indirect execution of above functions via function pointers
// Create structure for function pointers
typedef struct
{
void (*pDo_Command1)(void);
void (*pDo_Command2)(void);
}EXECUTE_FUNC_STRUCT;
// Update structure instance with functions address
EXECUTE_FUNC_STRUCT ExecFunc = {
do_Command1,
do_Command2,
};
void do_Func(void)
{
EXECUTE_FUNC_STRUCT *pExecFunc; // Create structure pointer
pExecFun = &ExecFunc; // Assign structure instance address to the structure pointer
pExecFun->pDo_Command1(); // Execute command 1 function via structure pointer
pExecFun->pDo_Command2(); // Execute command 2 function via structure pointer
}
While Option 1 is easy to understand and implement, why do we need to use Option 2?
While Option 1 is easy to understand and implement, why do we need to use Option 2?
Option 1 doesn't allow you to change the behavior without changing the code - it will always execute the same functions in the same order every time the program is executed. Which, sometimes, is the right answer.
Option 2 gives you the flexibility to execute different functions, or to execute do_Command2 before do_Command1, based decisions at runtime (say after reading a configuration file, or based on the result of another operation, etc.).
Real-world example from personal experience - I was working on an application that would read data files generated from Labview-driven instruments and load them into a database. There were four different instruments, and for each instrument there were two types of files, one for calibration and the other containing actual data. The file naming convention was such that I could select the parsing routine based on the file name. Now, I could have written my code such that:
void parse ( const char *fileName )
{
if ( fileTypeIs( fileName, "GRA" ) && fileExtIs( fileName, "DAT" ) )
parseGraDat( fileName );
else if ( fileTypeIs( fileName, "GRA" ) && fileExtIs ( fileName, "CAL" ) )
parseGraCal( fileName );
else if ( fileTypeIs( fileName, "SON" ) && fileExtIs ( fileName, "DAT" ) )
parseSonDat( fileName );
// etc.
}
and that would have worked just fine. However, at the time, there was a possibility that new instruments would be added later and that there may be additional file types for the instruments. So, I decided that instead of a long if-else chain, I would use a lookup table. That way, if I did have to add new parsing routines, all I had to do was write the new routine and add an entry for it to the lookup table - I didn't have to modify any of the main program logic. The table looked something like this:
struct lut {
const char *type;
const char *ext;
void (*parseFunc)( const char * );
} LUT[] = { {"GRA", "DAT", parseGraDat },
{"GRA", "CAL", parseGraCal },
{"SON", "DAT", parseSonDat },
{"SON", "CAL", parseSonCal },
// etc.
};
Then I had a function that would take the file name, search the lookup table, and return the appropriate parsing function (or NULL if the filename wasn't recognized):
void (*parse)(const char *) = findParseFunc( LUT, fileName );
if ( parse )
parse( fileName );
else
log( ERROR, "No parsing function for %s", fileName );
Again, there's no reason I couldn't have used the if-else chain, and in retrospect it's probably what I should have done for that particular app1. But it's a really powerful technique for writing code that needs to be flexible and responsive.
I suffer from a tendency towards premature generalization - I'm writing code to solve what I think will be issues five years from now instead of the issue today, and I wind up with code that tends to be more complex than necessary.
Best explained via Example.
Example 1:
Lets say you want to implement a Shape class with a draw() method, then you would need a function pointer in order to do that.
struct Shape {
void (*draw)(struct Shape*);
};
void draw(struct Shape* s) {
s->draw(s);
}
void draw_rect(struct Shape *s) {}
void draw_ellipse(struct Shape *s) {}
int main()
{
struct Shape rect = { .draw = draw_rect };
struct Shape ellipse = { .draw = draw_ellipse };
struct Shape *shapes[] = { &rect, &ellipse };
for (int i=0; i < 2; ++i)
draw(shapes[i]);
}
Example 2:
FILE *file = fopen(...);
FILE *mem = fmemopen(...); /* POSIX */
Without function pointers, there would be no way to implement a common interface for file and memory streams.
Addendum
Well, there is another way. Based on the Shape example:
enum ShapeId {
SHAPE_RECT,
SHAPE_ELLIPSE
};
struct Shape {
enum ShapeId id;
};
void draw(struct Shape *s)
{
switch (s->id) {
case SHAPE_RECT: draw_rect(s); break;
case SHAPE_ELLIPSE: draw_ellipse(s); break;
}
}
The advantage of the second example could be, that the compiler could inline the functions, then you would have omitted the overhead of a function call.
"Everything in computer science can be solved with one more level of indirection."
The struct-of-function-pointers "pattern", let's call it, permits runtime choices. SQLite uses it all over the place, for example, for portability. If you provide a "file system" meeting its required semantics, then you can run SQLite on it, with Posix nowhere in sight.
GnuCOBOL uses the same idea for indexed files. Cobol defines ISAM semantics, whereby a program can read a record from a file by specifying a key. The underlying name-value store can be provided by several (configurable) libraries, which all provide the same functionality, but use different names for their "read a record" function. By wrapping these up as function pointers, the Cobol runtime support library can use any of those key-value systems, or even more than one at the same time (for different files, of course).
What is the intention to set handle to an object as pointer-to pointer but not pointer? Like following code:
FT_Library library;
FT_Error error = FT_Init_FreeType( &library );
where
typedef struct FT_LibraryRec_ *FT_Library
so &library is a FT_LIBraryRec_ handle of type FT_LIBraryRec_**
It's a way to emulate pass by reference in C, which otherwise only have pass by value.
The 'C' library function FT_Init_FreeType has two outputs, the error code and/or the library handle (which is a pointer).
In C++ we'd more naturally either:
return an object which encapsulated the success or failure of the call and the library handle, or
return one output - the library handle, and throw an exception on failure.
C APIs are generally not implemented this way.
It is not unusual for a C Library function to return a success code, and to be passed the addresses of in/out variables to be conditionally mutated, as per the case above.
The approach hides implementation. It speeds up compilation of your code. It allows to upgrade data structures used by the library without breaking existing code that uses them. Finally, it makes sure the address of that object never changes, and that you don’t copy these objects.
Here’s how the version with a single pointer might be implemented:
struct FT_Struct
{
// Some fields/properties go here, e.g.
int field1;
char* field2;
}
FT_Error Init( FT_Struct* p )
{
p->field1 = 11;
p->field2 = malloc( 100 );
if( nullptr == p->field2 )
return E_OUTOFMEMORY;
return S_OK;
}
Or C++ equivalent, without any pointers:
class FT_Struct
{
int field1;
std::vector<char> field2;
public:
FT_Struct() :
field1( 11 )
{
field2.resize( 100 );
}
};
As a user of the library, you have to include struct/class FT_Struct definition. Libraries can be very complex so this will slow down compilation of your code.
If the library is dynamic i.e. *.dll on windows, *.so on linux or *.dylib on osx, you upgrade the library and if the new version changes memory layout of the struct/class, old applications will crash.
Because of the way C++ works, objects are passed by value, i.e. you normally expect them to be movable and copiable, which is not necessarily what library author wants to support.
Now consider the following function instead:
FT_Error Init( FT_Struct** pp )
{
try
{
*pp = new FT_Struct();
return S_OK;
}
catch( std::exception& ex )
{
return E_FAIL;
}
}
As a user of the library, you no longer need to know what’s inside FT_Struct or even what size it is. You don’t need to #include the implementation details, i.e. compilation will be faster.
This plays nicely with dynamic libraries, library author can change memory layout however they please, as long as the C API is stable, old apps will continue to work.
The API guarantees you won’t copy or move the values, you can’t copy structures of unknown lengths.
I'm working on native call bindings for a virtual machine, and one of the features is to be able to look up standard libc functions by name at runtime. On windows this becomes a bit of a hassle because I need to get a handle to the msvcrt module that's currently loaded in the process. Normally this is msvcrt.dll, but it could be other variants as well (msvcr100.dll, etc) and a call to GetModuleHandle("msvcrt") could fail if a variant with a different name is used.
What I would like to be able to do is a reverse lookup, take a function pointer from libc (which I have in abundance) and get a handle to the module that provides it. Basically, something like this:
HANDLE hlibc = ReverseGetModuleHandle(fprintf); // Any func from libc should do the trick
void *vfunc = GetProcAddress(hlibc);
Is there such a thing in the win32 API, without descending into a manual walk of process handles and symbol tables? Conversely, if I am over-thinking the problem, is there an easier way to look up a libc function by name on win32?
The documented way of obtaining the module handle is by using GetModuleHandleEx.
HMODULE hModule = NULL;
if(GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, // behave like GetModuleHandle
(LPCTSTR)address, &hModule))
{
// hModule should now refer to the module containing the target address.
}
MEMORY_BASIC_INFORMATION mbi;
HMODULE mod;
if (VirtualQuery( vfunc, &mbi, sizeof(mbi) ))
{
mod = (HMODULE)mbi.AllocationBase;
}
Unfortunately you will have to walk through modules as you feared. It's not too bad though. Here is the idea, some code written in notepad:
MODULEENTRY32 me = {0};
HANDLE hSnapshot = CreateToolhelp32Snapshot( TH32CS_SNAPMODULE, 0 );
me.dwSize = sizeof me;
Module32First( hSnapshot, &me );
if( me.modBaseAddr <= funcPtr &&
( me.modBaseAddr + me.modBaseSize ) > funcPtr ) {
...
break;
}
do {
} while( Module32Next( hSnapshot, &me ) );
CloseHandle( hSnapshot );
I was trying to write a small debug utility and for this I need to get the function/global variable address given its name. This is built-in debug utility, which means that the debug utility will run from within the code to be debugged or in plain words I cannot parse the executable file.
Now is there a well-known way to do that ? The plan I have is to make the .debug_* sections to to be loaded into to memory [which I plan to do by a cheap trick like this in ld script]
.data {
*(.data)
__sym_start = .;
(debug_);
__sym_end = .;
}
Now I have to parse the section to get the information I need, but I am not sure this is doable or is there issues with this - this is all just theory. But it also seems like too much of work :-) is there a simple way. Or if someone can tell upfront why my scheme will not work, it ill also be helpful.
Thanks in Advance,
Alex.
If you are running under a system with dlopen(3) and dlsym(3) (like Linux) you should be able to:
char thing_string[] = "thing_you_want_to_look_up";
void * handle = dlopen(NULL, RTLD_LAZY | RTLD_NOLOAD);
// you could do RTLD_NOW as well. shouldn't matter
if (!handle) {
fprintf(stderr, "Dynamic linking on main module : %s\n", dlerror() );
exit(1);
}
void * addr = dlsym(handle, thing_string);
fprintf(stderr, "%s is at %p\n", thing_string, addr);
I don't know the best way to do this for other systems, and this probably won't work for static variables and functions. C++ symbol names will be mangled, if you are interested in working with them.
To expand this to work for shared libraries you could probably get the names of the currently loaded libraries from /proc/self/maps and then pass the library file names into dlopen, though this could fail if the library has been renamed or deleted.
There are probably several other much better ways to go about this.
edit without using dlopen
/* name_addr.h */
struct name_addr {
const char * sym_name;
const void * sym_addr;
};
typedef struct name_addr name_addr_t;
void * sym_lookup(cost char * name);
extern const name_addr_t name_addr_table;
extern const unsigned name_addr_table_size;
/* name_addr_table.c */
#include "name_addr.h"
#define PREMEMBER( X ) extern const void * X
#define REMEMBER( X ) { .sym_name = #X , .sym_addr = (void *) X }
PREMEMBER(strcmp);
PREMEMBER(printf);
PREMEMBER(main);
PREMEMBER(memcmp);
PREMEMBER(bsearch);
PREMEMBER(sym_lookup);
/* ... */
const name_addr_t name_addr_table[] =
{
/* You could do a #include here that included the list, which would allow you
* to have an empty list by default without regenerating the entire file, as
* long as your compiler only warns about missing include targets.
*/
REMEMBER(strcmp),
REMEMBER(printf),
REMEMBER(main),
REMEMBER(memcmp),
REMEMBER(bsearch),
REMEMBER(sym_lookup);
/* ... */
};
const unsigned name_addr_table_size = sizeof(name_addr_table)/sizeof(name_addr_t);
/* name_addr_code.c */
#include "name_addr.h"
#include <string.h>
void * sym_lookup(cost char * name) {
unsigned to_go = name_addr_table_size;
const name_addr_t *na = name_addr_table;
while(to_to) {
if ( !strcmp(name, na->sym_name) ) {
return na->sym_addr;
}
na++;
to_do--;
}
/* set errno here if you are using errno */
return NULL; /* Or some other illegal value */
}
If you do it this way the linker will take care of filling in the addresses for you after everything has been laid out. If you include header files for all of the symbols that you are listing in your table then you will not get warnings when you compile the table file, but it will be much easier just to have them all be extern void * and let the compiler warn you about all of them (which it probably will, but not necessarily).
You will also probably want to sort your symbols by name such that you can use a binary search of the list rather than iterate through it.
You should note that if you have members in the table which are not otherwise referenced by the program (like if you had an entry for sqrt in the table, but didn't call it) the linker will then want (need) to link those functions into your image. This can make it blow up.
Also, if you were taking advantage of global optimizations having this table will likely make those less effective since the compiler will think that all of the functions listed could be accessed via pointer from this list and that it cannot see all of the call points.
Putting static functions in this list is not straight forward. You could do this by changing the table to dynamic and doing it at run time from a function in each module, or possibly by generating a new section in your object file that the table lives in. If you are using gcc:
#define SECTION_REMEMBER(X) \
static const name_addr_t _name_addr##X = \
{.sym_name= #X , .sym_addr = (void *) X } \
__attribute__(section("sym_lookup_table" ) )
And tack a list of these onto the end of each .c file with all of the symbols that you want to remember from that file. This will require linker work so that the linker will know what to do with these members, but then you can iterate over the list by looking at the begin and end of the section that it resides in (I don't know exactly how to do this, but I know it can be done and isn't TOO difficult). This will make having a sorted list more difficult, though. Also, I'm not entirely certain initializing the .sym_name to a string literal's address would not result in cramming the string into this section, but I don't think it would. If it did then this would break things.
You can still use objdump to get a list of the symbols that the object file (probably elf) contains, and then filter this for the symbols you are interested in, and then regenerate the table file the table's members listed.