Why doesn't LuaJIT's FFI module require declared calling conventions? - ffi

This is something I've been curious about for a while: I was wondering how LuaJIT's FFI module manages to use the correct calling conventions for invoking external native functions without any need for declarations in the user's prototypes.
I tried reading through the source code to figure this out on my own, but finding what I was looking for proved to be too difficult, so any help would be appreciated.
Edit
In order to verify that calling conventions are auto-determined when not declared, I wrote the following 32-bit test DLL to be compiled with MSVC's C compiler:
// Use multibyte characters for our default char type
#define _MBCS 1
// Speed up build process with minimal headers.
#define WIN32_LEAN_AND_MEAN
#define VC_EXTRALEAN
// System includes
#include <windows.h>
#include <stdio.h>
#define CALLCONV_TEST(CCONV) \
int __##CCONV test_##CCONV(int arg1, float arg2, const char* arg3) \
{ \
return CALLCONV_WORK(arg1, arg2, arg3); \
__pragma(comment(linker, "/EXPORT:" __FUNCTION__ "=" __FUNCDNAME__ )) \
}
#define CALLCONV_WORK(arg1,arg2,arg3) \
test_calls_work(__FUNCTION__, arg1, arg2, arg3, __COUNTER__);
static int test_calls_work(const char* funcname, int arg1, float arg2, const char* arg3, int retcode)
{
printf("[%s call]\n", funcname);
printf(" arg1 => %d\n", arg1);
printf(" arg2 => %f\n", arg2);
printf(" arg3 => \"%s\"\n", arg3);
printf(" <= return %d\n", retcode);
return retcode;
}
CALLCONV_TEST(cdecl) // => int __cdecl test_cdecl(int arg1, float arg2, const char* arg3);
CALLCONV_TEST(stdcall) // => int __stdcall test_stdcall(int arg1, float arg2, const char* arg3);
CALLCONV_TEST(fastcall) // => int __fastcall test_fastcall(int arg1, float arg2, const char* arg3);
BOOL WINAPI DllMain(HINSTANCE hInstance, DWORD dwReason, LPVOID lpReserved)
{
if(dwReason == DLL_PROCESS_ATTACH) {
DisableThreadLibraryCalls(hInstance);
}
return TRUE;
}
I then wrote an LUA script for calling the exported functions with the ffi module:
local ffi = require('ffi')
local testdll = ffi.load('ljffi-test.dll')
ffi.cdef[[
int test_cdecl(int arg1, float arg2, const char* arg3);
int test_stdcall(int arg1, float arg2, const char* arg3);
int test_fastcall(int arg1, float arg2, const char* arg3);
]]
local function run_tests(arg1, arg2, arg3)
local function cconv_test(name)
local funcname = 'test_' .. name
local handler = testdll[funcname]
local ret = tonumber(handler(arg1, arg2, arg3))
print(string.format(' => got %d\n', ret))
end
cconv_test('cdecl')
cconv_test('stdcall')
cconv_test('fastcall')
end
run_tests(3, 1.33, 'string value')
After compiling the DLL and running the script, I received the following output:
[test_cdecl call]
arg1 => 3
arg2 => 1.330000
arg3 => "string value"
<= return 0
=> got 0
[test_stdcall call]
arg1 => 3
arg2 => 1.330000
arg3 => "string value"
<= return 1
=> got 1
[test_fastcall call]
arg1 => 0
arg2 => 0.000000
arg3 => "(null)"
<= return 2
=> got 2
As you can see, the ffi module accurately resolve the calling conventions for the __cdecl calling convention and the __stdcall calling convention. (but appears to have called the __fastcall function incorrectly)
Lastly, I've included dumpbin's output to show that all functions are being exported with undecorated names.
> dumpbin.exe /EXPORTS ljffi-test.dll
Microsoft (R) COFF/PE Dumper Version 10.00.40219.01
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file ljffi-test.dll
File Type: DLL
Section contains the following exports for ljffi-test.dll
00000000 characteristics
548838D4 time date stamp Wed Dec 10 04:13:08 2014
0.00 version
1 ordinal base
3 number of functions
3 number of names
ordinal hint RVA name
1 0 00001000 test_cdecl
2 1 000010C0 test_fastcall
3 2 00001060 test_stdcall
Summary
1000 .data
1000 .rdata
1000 .reloc
1000 .text
Edit 2
Just to clarify, since calling conventions are only really relevant for 32-bit Windows compilers, so that is the primary focus for this question. (Unless I'm mistaken, compilers targeting the Win64 platform only use the FASTCALL calling convention, and GCC uses the CDECL calling convention for all other platforms supported by LuaJIT)
As far as I know, the only place to find information about functions exported from a PE file is the IMAGE_EXPORT_DIRECTORY, and if function names are exported without decorators, there is no information remaining that indicates the calling convention of a particular function.
Following that logic, the only remaining method I can think of for determining a function's calling convention is to analyze the assembly of the exported function, and determine the convention based on the stack usage. That seems like a bit much, though, when I consider the differences produced by different compilers and optimization levels.

Calling convention is something platform dependent.
Usually there is one platform's default and you may specify others.
From http://luajit.org/ext_ffi_semantics.html:
The C parser complies to the C99 language standard plus the following extensions:
...
GCC attribute with the following attributes: aligned, packed, mode, vector_size, cdecl, fastcall, stdcall, thiscall.
...
MSVC __cdecl, __fastcall, __stdcall, __thiscall, __ptr32, __ptr64,
Most interesting is Win32. Here calling convention maybe encoded with decorators Win32 calling conventions.
LuaJIT has code to recognize decorators.
Also, LuaJIT by default use __stdcall call convention for WinAPI Dlls: kernel32.dll, user32.dll and gdi32.dll.

Related

EDK2 Shell Application using Variable Arguments built with GCC causes Page Fault when run

I am having issues with using variable arguments under EDK2 (x64 shell application) when built under a Linux host with gcc. Program builds but when executed it will cause a page fault at the point VA_ARG() is executed.
The same code when built under a Windows host with VS2015 works without issue.
This seems related to GCC bug 50818 but I can find no solution.
#include <Uefi.h>
#include <Library/UefiLib.h>
#include <Library/PrintLib.h>
#include <Library/ShellCEntryLib.h>
VOID PrintInts(UINTN n, ...)
{
VA_LIST vl;
VA_START(vl, n);
Print(L"Printing integers:");
for (UINTN i=0; i<n; i++) {
UINTN val = 0;
val = VA_ARG(vl, UINTN);
Print(L" [%d]", val);
}
VA_END(vl);
Print(L"\n");
}
INTN EFIAPI ShellAppMain(IN UINTN Argc, IN CHAR16 **Argv)
{
UINTN a = 3;
UINTN b = 10;
UINTN c = 9;
PrintInts(3, a, b, c);
return 0;
}
I have found a fix and that is to define the function with the EFIAPI tag and this fixes the issue, i.e.
VOID EFIAPI PrintInts(UINTN n, ...)
From this link:
When creating a 32-bit UEFI application, EFIAPI is empty; GCC will compile the "efi_main" function using the standard C calling convention. When creating a 64-bit UEFI application, EFIAPI expands to "__attribute__((ms_abi))" and GCC will compile the "efi_main" function using Microsoft's x64 calling convention, as specified by UEFI. Only functions that will be called directly from UEFI (including main, but also callbacks) need to use the UEFI calling convention.
Also it only seem to be an issue with GCC as if I use CLANG I do not need to specify EFIAPI.

Figure out function parameter count at compile time

I have a C library (with C headers) which exists in two different versions.
One of them has a function that looks like this:
int test(char * a, char * b, char * c, bool d, int e);
And the other version looks like this:
int test(char * a, char * b, char * c, bool d)
(for which e is not given as function parameter but it's hard-coded in the function itself).
The library or its headers do not define / include any way to check for the library version so I can't just use an #if or #ifdef to check for a version number.
Is there any way I can write a C program that can be compiled with both versions of this library, depending on which one is installed when the program is compiled? That way contributors that want to compile my program are free to use either version of the library and the tool would be able to be compiled with either.
So, to clarify, I'm looking for something like this (or similar):
#if HAS_ARGUMENT_COUNT(test, 5)
test("a", "b", "c", true, 20);
#elif HAS_ARGUMENT_COUNT(test, 4)
test("a", "b", "c", true);
#else
#error "wrong argument count"
#endif
Is there any way to do that in C? I was unable to figure out a way.
The library would be libogc ( https://github.com/devkitPro/libogc ) which changed its definition of if_config a while ago, and I'd like to make my program work with both the old and the new version. I was unable to find any version identifier in the library. At the moment I'm using a modified version of GCC 8.3.
This should be done at the configure stage, using an Autoconf (or CMake, or whatever) test step -- basically, attempting to compile a small program which uses the five-parameter signature, and seeing if it compiles successfully -- to determine which version of the library is in use. That can be used to set a preprocessor macro which you can use in an #if block in your code.
I think there's no way to do this at the preprocesing stage (at least not without some external scripts). On the other hand, there is a way to detect a function's signature at compiling time if you're using C11: _Generic. But remember: you can't use this in a macro like #if because primary expressions aren't evaluated at the preprocessing stage, so you can't dynamically choose to call the function with signature 1 or 2 in that stage.
#define WEIRD_LIB_FUNC_TYPE(T) _Generic(&(T), \
int (*)(char *, char *, char *, bool, int): 1, \
int (*)(char *, char *, char *, bool): 2, \
default: 0)
printf("test's signature: %d\n", WEIRD_LIB_FUNC_TYPE(test));
// will print 1 if 'test' expects the extra argument, or 2 otherwise
I'm sorry if this does not answer your question. If you really can't detect the version from the "stock" library header file, there are workarounds where you can #ifdef something that's only present in a specific version of that library.
This is just a horrible library design.
Update: after reading the comments, I should clarify for future readers that it isn't possible in the preprocessing stage but it is possible at compile time still. You'd just have to conditionally cast the function call based on my snippet above.
typedef int (*TYPE_A)(char *, char *, char *, bool, int);
typedef int (*TYPE_B)(char *, char *, char *, bool);
int newtest(char *a, char *b, char *c, bool d, int e) {
void (*func)(void) = (void (*)(void))&test;
if (_Generic(&test, TYPE_A: 1, TYPE_B: 2, default: 0) == 1) {
return ((TYPE_A)func)(a, b, c, d, e);
}
return ((TYPE_B)func)(a, b, c, d);
}
This indeed works although it might be controversial to cast a function this way. The upside is, as #pizzapants184 said, the condition will be optimized away because the _Generic call will be evaluated at compile-time.
I don't see any way to do that with standard C, if you are compiling with gcc a very very ugly way can be using gcc aux-info in a command and passing the number of parameters with -D:
#!/bin/sh
gcc -aux-info output.info demo.c
COUNT=`grep "extern int foo" output.info | tr -dc "," | wc -m`
rm output.info
gcc -o demo demo.c -DCOUNT="$COUNT + 1"
./demo
This snippet
#include <stdio.h>
int foo(int a, int b, int c);
#ifndef COUNT
#define COUNT 0
#endif
int main(void)
{
printf("foo has %d parameters\n", COUNT);
return 0;
}
outputs
foo has 3 parameters
Attempting to support compiling code with multiple versions of a static library serves no useful purpose. Update your code to use the latest release and stop making life more difficult than it needs to be.
In Dennis Ritchie's original C language, a function could be passed any number of arguments, regardless of the number of parameters it expected, provided that the function didn't access any parameters beyond those that were passed to it. Even on platforms whose normal calling convention wouldn't be able to accommodate this flexibility, C compilers would generally used a different calling convention that could support it unless functions were marked with qualifiers like pascal to indicate that they should use the ordinary calling convention.
Thus, something like the following would have had fully defined behavior in Ritchie's original C language:
int addTwoOrThree(count, x, y, z)
int count, x, y, z;
{
if (count == 3)
return x+y+z;
else
return x+y;
}
int test()
{
return count(2, 10,20) + count(3, 1,2,3);
}
Because there are some platforms where it would be impractical to support such flexibility by default, the C Standard does not require that compilers meaningfully process any calls to functions which have more or fewer arguments than expected, except that functions which have been declared with a ... parameter will "expect" any number of arguments that is at least as large as the number of actual specified parameters. It is thus rare for code to be written that would exploit the flexibility that was present in Ritchie's language. Nonetheless, many implementations will still accept code written to support that pattern if the function being called is in a separate compilation unit from the callers, and it is declared but not prototyped within the compilation units that call it.
you don't.
the tools you're working with are statically linked and don't support versioning.
you can get around it using all kind of tricks and tips that have been mentioned, but at the end of the day they are ugly patch works of something you're trying to do that makes no sense in this context(toolkit/code environment).
you design your code for the version of the toolkit you have installed. its a hard requirement. i also don't understand why you would want to design your gamecube/wii code to allow building on different versions.
the toolkit is constantly changing to fix bugs, assumptions etc etc.
if you want your code to use an old version that potentially have bugs or do things wrong, that is on you.
i think you should realize what kind of botch work you're dealing with here if you need or want to do this with an constantly evolving toolkit..
I also think, but this is because i know you and your relationship with DevKitPro, i assume you ask this because you have an older version installed and your CI builds won't work because they use a newer version (from docker). its either this, or you have multiple versions installed on your machine for a different project you build (but won't update source for some odd reason).
If your compiler is a recent GCC, e.g. some GCC 10 in November 2020, you might write your own GCC plugin to check the signature in your header files (and emit appropriate and related C preprocessor #define-s and/or #ifdef, à la GNU autoconf). Your plugin could (for example) fill some sqlite database and you would later generate some #include-d header file.
You then would set up your build automation (e.g. your Makefile) to use that GCC plugin and the data it has computed when needed.
For a single function, such an approach is overkill.
For some large project, it could make sense, in particular if you also decide to also code some project-specific coding rules validator in your GCC plugin.
Writing a GCC plugin could take weeks of your time, and you may need to patch your plugin source code when you would switch to a future GCC 11.
See also this draft report and the European CHARIOT and DECODER projects (funding the work described in that report).
BTW, you might ask the authors of that library to add some versioning metadata. Inspiration might come from libonion or Glib or libgccjit.
BTW, as rightly commented in this issue, you should not use an unmaintained old version of some opensource library. Use the one that is worked on.
I'd like to make my program work with both the old and the new version.
Why?
making your program work with the old (unmaintained) version of libogc is adding burden to both you and them. I don't understand why you would depend upon some old unmaintained library, if you can avoid doing that.
PS. You could of course write a plugin for GCC 8. I do recommend switching to GCC 10: it did improve.
I'm not sure this solves your specific problem, or helps you at all, but here's a preprocessor contraption, due to Laurent Deniau, that counts the number of arguments passed to a function at compile time.
Meaning, something like args_count(a,b,c) evaluates (at compile time) to the constant literal constant 3, and something like args_count(__VA_ARGS__) (within a variadic macro) evaluates (at compile time) to the number of arguments passed to the macro.
This allows you, for instance, to call variadic functions without specifying the number of arguments, because the preprocessor does it for you.
So, if you have a variadic function
void function_backend(int N, ...){
// do stuff
}
where you (typically) HAVE to pass the number of arguments N, you can automate that process by writing a "frontend" variadic macro
#define function_frontend(...) function_backend(args_count(__VA_ARGS__), __VA_ARGS__)
And now you call function_frontend() with as many arguments as you want:
I made you Youtube tutorial about this.
#include <stdint.h>
#include <stdarg.h>
#include <stdio.h>
#define m_args_idim__get_arg100( \
arg00,arg01,arg02,arg03,arg04,arg05,arg06,arg07,arg08,arg09,arg0a,arg0b,arg0c,arg0d,arg0e,arg0f, \
arg10,arg11,arg12,arg13,arg14,arg15,arg16,arg17,arg18,arg19,arg1a,arg1b,arg1c,arg1d,arg1e,arg1f, \
arg20,arg21,arg22,arg23,arg24,arg25,arg26,arg27,arg28,arg29,arg2a,arg2b,arg2c,arg2d,arg2e,arg2f, \
arg30,arg31,arg32,arg33,arg34,arg35,arg36,arg37,arg38,arg39,arg3a,arg3b,arg3c,arg3d,arg3e,arg3f, \
arg40,arg41,arg42,arg43,arg44,arg45,arg46,arg47,arg48,arg49,arg4a,arg4b,arg4c,arg4d,arg4e,arg4f, \
arg50,arg51,arg52,arg53,arg54,arg55,arg56,arg57,arg58,arg59,arg5a,arg5b,arg5c,arg5d,arg5e,arg5f, \
arg60,arg61,arg62,arg63,arg64,arg65,arg66,arg67,arg68,arg69,arg6a,arg6b,arg6c,arg6d,arg6e,arg6f, \
arg70,arg71,arg72,arg73,arg74,arg75,arg76,arg77,arg78,arg79,arg7a,arg7b,arg7c,arg7d,arg7e,arg7f, \
arg80,arg81,arg82,arg83,arg84,arg85,arg86,arg87,arg88,arg89,arg8a,arg8b,arg8c,arg8d,arg8e,arg8f, \
arg90,arg91,arg92,arg93,arg94,arg95,arg96,arg97,arg98,arg99,arg9a,arg9b,arg9c,arg9d,arg9e,arg9f, \
arga0,arga1,arga2,arga3,arga4,arga5,arga6,arga7,arga8,arga9,argaa,argab,argac,argad,argae,argaf, \
argb0,argb1,argb2,argb3,argb4,argb5,argb6,argb7,argb8,argb9,argba,argbb,argbc,argbd,argbe,argbf, \
argc0,argc1,argc2,argc3,argc4,argc5,argc6,argc7,argc8,argc9,argca,argcb,argcc,argcd,argce,argcf, \
argd0,argd1,argd2,argd3,argd4,argd5,argd6,argd7,argd8,argd9,argda,argdb,argdc,argdd,argde,argdf, \
arge0,arge1,arge2,arge3,arge4,arge5,arge6,arge7,arge8,arge9,argea,argeb,argec,arged,argee,argef, \
argf0,argf1,argf2,argf3,argf4,argf5,argf6,argf7,argf8,argf9,argfa,argfb,argfc,argfd,argfe,argff, \
arg100, ...) arg100
#define m_args_idim(...) m_args_idim__get_arg100(, ##__VA_ARGS__, \
0xff,0xfe,0xfd,0xfc,0xfb,0xfa,0xf9,0xf8,0xf7,0xf6,0xf5,0xf4,0xf3,0xf2,0xf1,0xf0, \
0xef,0xee,0xed,0xec,0xeb,0xea,0xe9,0xe8,0xe7,0xe6,0xe5,0xe4,0xe3,0xe2,0xe1,0xe0, \
0xdf,0xde,0xdd,0xdc,0xdb,0xda,0xd9,0xd8,0xd7,0xd6,0xd5,0xd4,0xd3,0xd2,0xd1,0xd0, \
0xcf,0xce,0xcd,0xcc,0xcb,0xca,0xc9,0xc8,0xc7,0xc6,0xc5,0xc4,0xc3,0xc2,0xc1,0xc0, \
0xbf,0xbe,0xbd,0xbc,0xbb,0xba,0xb9,0xb8,0xb7,0xb6,0xb5,0xb4,0xb3,0xb2,0xb1,0xb0, \
0xaf,0xae,0xad,0xac,0xab,0xaa,0xa9,0xa8,0xa7,0xa6,0xa5,0xa4,0xa3,0xa2,0xa1,0xa0, \
0x9f,0x9e,0x9d,0x9c,0x9b,0x9a,0x99,0x98,0x97,0x96,0x95,0x94,0x93,0x92,0x91,0x90, \
0x8f,0x8e,0x8d,0x8c,0x8b,0x8a,0x89,0x88,0x87,0x86,0x85,0x84,0x83,0x82,0x81,0x80, \
0x7f,0x7e,0x7d,0x7c,0x7b,0x7a,0x79,0x78,0x77,0x76,0x75,0x74,0x73,0x72,0x71,0x70, \
0x6f,0x6e,0x6d,0x6c,0x6b,0x6a,0x69,0x68,0x67,0x66,0x65,0x64,0x63,0x62,0x61,0x60, \
0x5f,0x5e,0x5d,0x5c,0x5b,0x5a,0x59,0x58,0x57,0x56,0x55,0x54,0x53,0x52,0x51,0x50, \
0x4f,0x4e,0x4d,0x4c,0x4b,0x4a,0x49,0x48,0x47,0x46,0x45,0x44,0x43,0x42,0x41,0x40, \
0x3f,0x3e,0x3d,0x3c,0x3b,0x3a,0x39,0x38,0x37,0x36,0x35,0x34,0x33,0x32,0x31,0x30, \
0x2f,0x2e,0x2d,0x2c,0x2b,0x2a,0x29,0x28,0x27,0x26,0x25,0x24,0x23,0x22,0x21,0x20, \
0x1f,0x1e,0x1d,0x1c,0x1b,0x1a,0x19,0x18,0x17,0x16,0x15,0x14,0x13,0x12,0x11,0x10, \
0x0f,0x0e,0x0d,0x0c,0x0b,0x0a,0x09,0x08,0x07,0x06,0x05,0x04,0x03,0x02,0x01,0x00, \
)
typedef struct{
int32_t x0,x1;
}ivec2;
int32_t max0__ivec2(int32_t nelems, ...){ // The largest component 0 in a list of 2D integer vectors
int32_t max = ~(1ll<<31) + 1; // Assuming two's complement
va_list args;
va_start(args, nelems);
for(int i=0; i<nelems; ++i){
ivec2 a = va_arg(args, ivec2);
max = max > a.x0 ? max : a.x0;
}
va_end(args);
return max;
}
#define max0_ivec2(...) max0__ivec2(m_args_idim(__VA_ARGS__), __VA_ARGS__)
int main(){
int32_t max = max0_ivec2(((ivec2){0,1}), ((ivec2){2,3}, ((ivec2){4,5}), ((ivec2){6,7})));
printf("%d\n", max);
}

Get names and addresses of exported functions in linux

I am able to get a list of exported function names and pointers from an executable in windows by using using the PIMAGE_DOS_HEADER API (example).
What is the equivalent API for Linux?
For context I am creating unit test executables and I am exporting functions starting with the name "test_" and I want the executable to just spin through and execute all of the test functions when run.
Example psuedo code:
int main(int argc, char** argv)
{
auto run = new_trun();
auto module = dlopen(NULL);
auto exports = get_exports(module); // <- how do I do this on unix?
for( auto i = 0; i < exports->length; i++)
{
auto export = exports[i];
if(strncmp("test_", export->name, strlen("test_")) == 0)
{
tcase_add(run, export->name, export->func);
}
}
return trun_run(run);
}
EDIT:
I was able to find what I was after using the top answer from this question:
List all the functions/symbols on the fly in C?
Additionally I had to use the gnu_hashtab_symbol_count function from Nominal Animal's answer below to handle the DT_GNU_HASH instead of the DT_HASH.
My final test main function looks like this:
int main(int argc, char** argv)
{
vector<string> symbols;
dl_iterate_phdr(retrieve_symbolnames, &symbols);
TRun run;
auto handle = dlopen(NULL, RTLD_LOCAL | RTLD_LAZY);
for(auto i = symbols.begin(); i != symbols.end(); i++)
{
auto name = *i;
auto func = (testfunc)dlsym(handle, name.c_str());
TCase tcase;
tcase.name = string(name);
tcase.func = func;
run.test_cases.push_back(tcase);
}
return trun_run(&run);
}
Which I then define tests in the assembly like:
// test.h
#define START_TEST(name) extern "C" EXPORT TResult test_##name () {
#define END_TEST return tresult_success(); }
// foo.cc
START_TEST(foo_bar)
{
assert_pending();
}
END_TEST
Which produces output that looks like this:
test_foo_bar: pending
1 pending
0 succeeded
1 total
I do get quite annoyed when I see questions asking how to do something in operating system X that you do in Y.
In most cases, it is not an useful approach, because each operating system (family) tends to have their own approach to issues, so trying to apply something that works in X in Y is like stuffing a cube into a round hole.
Please note: the text here is intended as harsh, not condesceding; my command of the English language is not as good as I'd like. Harshness combined with actual help and pointers to known working solutions seems to work best in overcoming nontechnical limitations, in my experience.
In Linux, a test environment should use something like
LC_ALL=C LANG=C readelf -s FILE
to list all the symbols in FILE. readelf is part of the binutils package, and is installed if you intend to build new binaries on the system. This leads to portable, robust code. Do not forget that Linux encompasses multiple hardware architectures that do have real differences.
To build binaries in Linux, you normally use some of the tools provided in binutils. If binutils provided a library, or there was an ELF library based on the code used in binutils, it would be much better to use that, rather than parse the output of the human utilities. However, there is no such library (the libbfd library binutils uses internally is not ELF-specific). The [URL=http://www.mr511.de/software/english.html]libelf[/URL] library is good, but it is completely separate work by chiefly a single author. Bugs in it have been reported to binutils, which is unproductive, as the two are not related. Simply put, there are no guarantees that it handles the ELF files on a given architecture the same way binutils does. Therefore, for robustness and reliability, you'll definitely want to use binutils.
If you have a test application, it should use a script, say /usr/lib/yourapp/list-test-functions, to list the test-related functions:
#!/bin/bash
export LC_ALL=C LANG=C
for file in "$#" ; do
readelf -s "$file" | while read num value size type bind vix index name dummy ; do
[ "$type" = "FUNC" ] || continue
[ "$bind" = "GLOBAL" ] || continue
[ "$num" = "$[$num]" ] || continue
[ "$index" = "$[$index]" ] || continue
case "$name" in
test_*) printf '%s\n' "$name"
;;
esac
done
done
This way, if there is an architecture that has quirks (in the binutils' readelf output format in particular), you only need to modify the script. Modifying such a simple script is not difficult, and it is easy to verify the script works correctly -- just compare the raw readelf output to the script output; anybody can do that.
A subroutine that constructs a pipe, fork()s a child process, executes the script in the child process, and uses e.g. getline() in the parent process to read the list of names, is quite simple and extremely robust. Since this is also the one fragile spot, we've made it very easy to fix any quirks or problems here by using that external script (that is customizable/extensible to cover those quirks, and easy to debug).
Remember, if binutils itself has bugs (other than output formatting bugs), any binaries built will almost certainly exhibit those same bugs also.
Being a Microsoft-oriented person, you probably will have trouble grasping the benefits of such a modular approach. (It is not specific to Microsoft, but specific to a single-vendor controlled ecosystem where the vendor-pushed approach is via overarching frameworks, and black boxes with clean but very limited interfaces. I think it as the framework limitation, or vendor-enforced walled garden, or prison garden. Looks good, but getting out is difficult. For description and history on the modular approach I'm trying to describe, see for example the Unix philosophy article at Wikipedia.)
The following shows that your approach is indeed possible in Linux, too -- although clunky and fragile; this stuff is intended to be done using the standard tools instead. It's just not the right approach in general.
The interface, symbols.h, is easiest to implement using a callback function that gets called for each symbol found:
#ifndef SYMBOLS_H
#ifndef _GNU_SOURCE
#error You must define _GNU_SOURCE!
#endif
#define SYMBOLS_H
#include <stdlib.h>
typedef enum {
LOCAL_SYMBOL = 1,
GLOBAL_SYMBOL = 2,
WEAK_SYMBOL = 3,
} symbol_bind;
typedef enum {
FUNC_SYMBOL = 4,
OBJECT_SYMBOL = 5,
COMMON_SYMBOL = 6,
THREAD_SYMBOL = 7,
} symbol_type;
int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom),
void *custom);
#endif /* SYMBOLS_H */
The ELF symbol binding and type macros are word-size specific, so to avoid the hassle, I declared the enum types above. I omitted some uninteresting types (STT_NOTYPE, STT_SECTION, STT_FILE), however.
The implementation, symbols.c:
#define _GNU_SOURCE
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include <stdio.h>
#include <fnmatch.h>
#include <dlfcn.h>
#include <link.h>
#include <errno.h>
#include "symbols.h"
#define UINTS_PER_WORD (__WORDSIZE / (CHAR_BIT * sizeof (unsigned int)))
static ElfW(Word) gnu_hashtab_symbol_count(const unsigned int *const table)
{
const unsigned int *const bucket = table + 4 + table[2] * (unsigned int)(UINTS_PER_WORD);
unsigned int b = table[0];
unsigned int max = 0U;
while (b-->0U)
if (bucket[b] > max)
max = bucket[b];
return (ElfW(Word))max;
}
static symbol_bind elf_symbol_binding(const unsigned char st_info)
{
#if __WORDSIZE == 32
switch (ELF32_ST_BIND(st_info)) {
#elif __WORDSIZE == 64
switch (ELF64_ST_BIND(st_info)) {
#else
switch (ELF_ST_BIND(st_info)) {
#endif
case STB_LOCAL: return LOCAL_SYMBOL;
case STB_GLOBAL: return GLOBAL_SYMBOL;
case STB_WEAK: return WEAK_SYMBOL;
default: return 0;
}
}
static symbol_type elf_symbol_type(const unsigned char st_info)
{
#if __WORDSIZE == 32
switch (ELF32_ST_TYPE(st_info)) {
#elif __WORDSIZE == 64
switch (ELF64_ST_TYPE(st_info)) {
#else
switch (ELF_ST_TYPE(st_info)) {
#endif
case STT_OBJECT: return OBJECT_SYMBOL;
case STT_FUNC: return FUNC_SYMBOL;
case STT_COMMON: return COMMON_SYMBOL;
case STT_TLS: return THREAD_SYMBOL;
default: return 0;
}
}
static void *dynamic_pointer(const ElfW(Addr) addr,
const ElfW(Addr) base, const ElfW(Phdr) *const header, const ElfW(Half) headers)
{
if (addr) {
ElfW(Half) h;
for (h = 0; h < headers; h++)
if (header[h].p_type == PT_LOAD)
if (addr >= base + header[h].p_vaddr &&
addr < base + header[h].p_vaddr + header[h].p_memsz)
return (void *)addr;
}
return NULL;
}
struct phdr_iterator_data {
int (*callback)(const char *libpath, const char *libname,
const char *objname, const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom);
void *custom;
};
static int iterate_phdr(struct dl_phdr_info *info, size_t size, void *dataref)
{
struct phdr_iterator_data *const data = dataref;
const ElfW(Addr) base = info->dlpi_addr;
const ElfW(Phdr) *const header = info->dlpi_phdr;
const ElfW(Half) headers = info->dlpi_phnum;
const char *libpath, *libname;
ElfW(Half) h;
if (!data->callback)
return 0;
if (info->dlpi_name && info->dlpi_name[0])
libpath = info->dlpi_name;
else
libpath = "";
libname = strrchr(libpath, '/');
if (libname && libname[0] == '/' && libname[1])
libname++;
else
libname = libpath;
for (h = 0; h < headers; h++)
if (header[h].p_type == PT_DYNAMIC) {
const ElfW(Dyn) *entry = (const ElfW(Dyn) *)(base + header[h].p_vaddr);
const ElfW(Word) *hashtab;
const ElfW(Sym) *symtab = NULL;
const char *strtab = NULL;
ElfW(Word) symbol_count = 0;
for (; entry->d_tag != DT_NULL; entry++)
switch (entry->d_tag) {
case DT_HASH:
hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
if (hashtab)
symbol_count = hashtab[1];
break;
case DT_GNU_HASH:
hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
if (hashtab) {
ElfW(Word) count = gnu_hashtab_symbol_count(hashtab);
if (count > symbol_count)
symbol_count = count;
}
break;
case DT_STRTAB:
strtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
break;
case DT_SYMTAB:
symtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
break;
}
if (symtab && strtab && symbol_count > 0) {
ElfW(Word) s;
for (s = 0; s < symbol_count; s++) {
const char *name;
void *const ptr = dynamic_pointer(base + symtab[s].st_value, base, header, headers);
symbol_bind bind;
symbol_type type;
int result;
if (!ptr)
continue;
type = elf_symbol_type(symtab[s].st_info);
bind = elf_symbol_binding(symtab[s].st_info);
if (symtab[s].st_name)
name = strtab + symtab[s].st_name;
else
name = "";
result = data->callback(libpath, libname, name, ptr, symtab[s].st_size, bind, type, data->custom);
if (result)
return result;
}
}
}
return 0;
}
int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom),
void *custom)
{
struct phdr_iterator_data data;
if (!callback)
return errno = EINVAL;
data.callback = callback;
data.custom = custom;
return errno = dl_iterate_phdr(iterate_phdr, &data);
}
When compiling the above, remember to link against the dl library.
You may find the gnu_hashtab_symbol_count() function above interesting; the format of the table is not well documented anywhere that I can find. This is tested to work on both i386 and x86-64 architectures, but it should be vetted against the GNU sources before relying on it in production code. Again, the better option is to just use those tools directly via a helper script, as they will be installed on any development machine.
Technically, a DT_GNU_HASH table tells us the first dynamic symbol, and the highest index in any hash bucket tells us the last dynamic symbol, but since the entries in the DT_SYMTAB symbol table always begin at 0 (actually, the 0 entry is "none"), I only consider the upper limit.
To match library and function names, I recommend using strncmp() for a prefix match for libraries (match at the start of the library name, up to the first .). Of course, you can use fnmatch() if you prefer glob patterns, or regcomp()+regexec() if you prefer regular expressions (they are built-in to the GNU C library, no external libraries are needed).
Here is an example program, example.c, that just prints out all the symbols:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
#include "symbols.h"
static int my_func(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom __attribute__((unused)))
{
printf("%s (%s):", libpath, libname);
if (*objname)
printf(" %s:", objname);
else
printf(" unnamed");
if (size > 0)
printf(" %zu-byte", size);
if (binding == LOCAL_SYMBOL)
printf(" local");
else
if (binding == GLOBAL_SYMBOL)
printf(" global");
else
if (binding == WEAK_SYMBOL)
printf(" weak");
if (type == FUNC_SYMBOL)
printf(" function");
else
if (type == OBJECT_SYMBOL || type == COMMON_SYMBOL)
printf(" variable");
else
if (type == THREAD_SYMBOL)
printf(" thread-local variable");
printf(" at %p\n", addr);
fflush(stdout);
return 0;
}
int main(int argc, char *argv[])
{
int arg;
for (arg = 1; arg < argc; arg++) {
void *handle = dlopen(argv[arg], RTLD_NOW);
if (!handle) {
fprintf(stderr, "%s: %s.\n", argv[arg], dlerror());
return EXIT_FAILURE;
}
fprintf(stderr, "%s: Loaded.\n", argv[arg]);
}
fflush(stderr);
if (symbols(my_func, NULL))
return EXIT_FAILURE;
return EXIT_SUCCESS;
}
To compile and run the above, use for example
gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 example.o symbols.o -ldl -o example
./example | less
To see the symbols in the program itself, use the -rdynamic flag at link time to add all symbols to the dynamic symbol table:
gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 -rdynamic example.o symbols.o -ldl -o example
./example | less
On my system, the latter prints out
(): stdout: 8-byte global variable at 0x602080
(): _edata: global at 0x602078
(): __data_start: global at 0x602068
(): data_start: weak at 0x602068
(): symbols: 70-byte global function at 0x401080
(): _IO_stdin_used: 4-byte global variable at 0x401150
(): __libc_csu_init: 101-byte global function at 0x4010d0
(): _start: global function at 0x400a57
(): __bss_start: global at 0x602078
(): main: 167-byte global function at 0x4009b0
(): _init: global function at 0x4008d8
(): stderr: 8-byte global variable at 0x602088
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097da0
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): __asprintf: global function at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): free: global function at 0x7fc652097000
...
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): dlvsym: 118-byte weak function at 0x7fc6520981f0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cf14a0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc65208c740
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __libc_enable_secure: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __tls_get_addr: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global_ro: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_find_dso_for_object: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_starting_up: weak at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_argv: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): putwchar: 292-byte global function at 0x7fc651d4a210
...
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): vwarn: 224-byte global function at 0x7fc651dc8ef0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): wcpcpy: 39-byte weak function at 0x7fc651d75900
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229bae0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_get_tls_static_info: 21-byte global function at 0x7fc6522adaa0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_PRIVATE: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.3: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.4: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): free: 42-byte weak function at 0x7fc6522b2c40
...
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): malloc: 13-byte weak function at 0x7fc6522b2bf0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_allocate_tls_init: 557-byte global function at 0x7fc6522adc00
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _rtld_global_ro: 304-byte global variable at 0x7fc6524bdcc0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): __libc_enable_secure: 4-byte global variable at 0x7fc6524bde68
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_rtld_di_serinfo: 1620-byte global function at 0x7fc6522a4710
I used ... to mark where I removed lots of lines.
Questions?
To get a list of exported symbols from a shared library (a .so) under Linux, there are two ways: the easy one and a slightly harder one.
The easy one is to use the console tools already available: objdump (included in GNU binutils):
$ objdump -T /usr/lib/libid3tag.so.0
00009c15 g DF .text 0000012e Base id3_tag_findframe
00003fac g DF .text 00000053 Base id3_ucs4_utf16duplicate
00008288 g DF .text 000001f2 Base id3_frame_new
00007b73 g DF .text 000003c5 Base id3_compat_fixup
...
The slightly harder way is to use libelf and write a C/C++ program to list the symbols yourself. Have a look at the elfutils package, which is also built from the libelf source. There is a program called eu-readelf (the elfutils version of readelf, not to be confused with the binutils readelf). eu-readelf -s $LIB lists exported symbols using libelf, so you should be able to use that as a starting point.

C: const initializer and debugging symbols

In code reviews I ask for option (1) below to be used as it results in a symbol being created (for debugging) whereas (2) and (3) do not appear to do so at least for gcc and icc. However (1) is not a true const and cannot be used on all compilers as an array size. Is there a better option that includes debug symbols and is truly const for C?
Symbols:
gcc f.c -ggdb3 -g ; nm -a a.out | grep _sym
0000000100000f3c s _symA
0000000100000f3c - 04 0000 STSYM _symA
Code:
static const int symA = 1; // 1
#define symB 2 // 2
enum { symC = 3 }; // 3
GDB output:
(gdb) p symA
$1 = 1
(gdb) p symB
No symbol "symB" in current context.
(gdb) p symC
No symbol "symC" in current context.
And for completeness, the source:
#include <stdio.h>
static const int symA = 1;
#define symB 2
enum { symC = 3 };
int main (int argc, char *argv[])
{
printf("symA %d symB %d symC %d\n", symA, symB, symC);
return (0);
}
The -ggdb3 option should be giving you macro debugging information. But this is a different kind of debugging information (it has to be different - it tells the debugger how to expand the macro, possibly including arguments and the # and ## operators) so you can't see it with nm.
If your goal is to have something that shows up in nm, then I guess you can't use a macro. But that's a silly goal; you should want to have something that actually works in a debugger, right? Try print symC in gdb and see if it works.
Since macros can be redefined, gdb requires the program to be stopped at a location where the macro existed so it can find the correct definition. In this program:
#include <stdio.h>
int main(void)
{
#define X 1
printf("%d\n", X);
#undef X
printf("---\n");
#define X 2
printf("%d\n", X);
}
If you break on the first printf and print X you'll get the 1; next to the second printf and gdb will tell you that there is no X; next again and it will show the 2.
Also the gdb command info macro foo can be useful, if foo is a macro that takes arguments and you want to see its definition rather than expand it with a specific set of arguments. And if a macro expands to something that's not an expression, gdb can't print it so info macro is the only thing you can do with it.
For better inspection of the raw debugging information, try objdump -W instead of nm.
However (1) is not a true const and cannot be used on all compilers as an array size.
This can be used as array size on all compilers that support C99 and latter (gcc, clang). For others (like MSVC) you have only the last two options.
Using option 3 is preferred 2. enums are different from #define constants. You can use them for debugging. You can use enum constants as l-value as well unlike #define constants.

Using ellipsis in cuda device function

I am trying to port some C code to a cuda kernel. The code I am porting uses ellipsis prevalently. When I try to use an ellipsis in a device function like below, I get an error saying that ellipsis are not allowed in device functions.
__device__ int add(int a, ...){}
However, cuda supports using printf in both host and device functions and uses ellipsis in their own code as below in common_functions.h.
extern "C"
{
extern _CRTIMP __host__ __device__ __device_builtin__ __cudart_builtin__ int __cdecl printf(const char*, ...);
extern _CRTIMP __host__ __device__ __device_builtin__ __cudart_builtin__ int __cdecl fprintf(FILE*, const char*, ...);
extern _CRTIMP __host__ __device__ __cudart_builtin__ void* __cdecl malloc(size_t) __THROW;
extern _CRTIMP __host__ __device__ __cudart_builtin__ void __cdecl free(void*) __THROW;
}
Is there a way to use ellipsis in a device function?
I would not like to hard code a max number of parameters and then change all the calls.
I also would not like to code a custom variadic function method.
I also tried creating a PTX file that I could use to replace the ellipsis usage as the ISA PTX documentation appears to have facilities for handling variable parameters (Note that the documentation says it does not support them and then provides a paragraph with supporting functions and examples. Perhaps, there is a typo?). I got a simple PTX file all the way through the process defined below but got stuck on the executable question in the last comment. I plan to read the nvcc compiler document to try and understand that.
How can I call a ptx function from CUDA C?
I am using a GTX660 which I believe is level 3.0 and cuda 5.0 toolkit on Ubuntu 12.04.
Update regarding the "magic" referred to below:
It looks to me like there must be something special happening in the compiler to restrict ellipsis usage and do something special. When I call printf as below:
printf("The result = %i from adding %i numbers.", result, 2);
I was surprised to find this in the ptx:
.extern .func (.param .b32 func_retval0) vprintf
(
.param .b64 vprintf_param_0,
.param .b64 vprintf_param_1
)
and later
add.u64 %rd2, %SP, 0;
st.u32 [%SP+0], %r5;
mov.u32 %r6, 2;
st.u32 [%SP+4], %r6;
// Callseq Start 1
{
.reg .b32 temp_param_reg;
.param .b64 param0;
st.param.b64 [param0+0], %rd1;
.param .b64 param1;
st.param.b64 [param1+0], %rd2;
.param .b32 retval0;
call.uni (retval0),
vprintf,
(
param0,
param1
);
It appears to me that the compiler accepts ellipsis for printf but then swaps a call to vprintf and creates a va_list on the fly manually. va_list is a valid type in device functions.
As #JaredHoberock stated (I think he will not mind if I create an answer):
__device__ functions cannot have ellipsis parameters; that is why you are receiving the compiler error message.
The built-in printf function is a special case, and does not indicate general support for ellipsis.
There are some alternatives that could be mentioned, but none that I am aware of allow truly general variable arguments support. For example, as Jared stated you could simply define a number of parameters, some/most of which have default values specified, so they do not need to be passed explicitly.
You could also play games with templating as is done in the cuPrintf sample code to try and simulate variable arguments, but this will also not be arbitrarily extensible, I don't think.

Resources