I'm writing a small DLL in C using mingw-w64, which should be callable by VB.net programs. The only exports are functions whose parameters and return types are primitive types.
Should I use __stdcall on the dllexport functions or not?
When searching the web I have seen examples both with and without it. There is discussion of how it affects name decoration but no advice as to whether this is a good thing or not, and what the impact is on my DLL's usability.
There is really no good reason to use nondefault calling convention/ABI (__stdcall or otherwise) anywhere except when you need to call an existing interface defined that way. It's just gratuitous ugliness. Where it's done all over existing Windows stuff, it's legacy cargo culting from the Win16 era, and has no actual rationale behind it.
Related
error LNK2019: unresolved external symbol __imp_yourexternFunc
I have a C DLL function that is external called "output" which is similar to printf:
output( format , va_args);
In *.h files its declared:
__declspec( dllexport ) void output( LPCTSTR format, ... );
or
__declspec( dllimport ) void output( LPCTSTR format, ... );
(for *.h includes) there is a MACRO that selects between export/import base on usage
In my rust module I declare it extern as:
#[link(name="aDLL", kind="dylib")]
extern {
fn output( format:LPCTSTR, ...);
}
The dumpbin for this function is as follows (from dumpbin)
31 ?output##YAXPEBDZZ (void __cdecl output(char const *,...))
But when I attempt to link this the rustc linker is prepending _imp to the function name:
second_rust_lib_v0.second_rust_lib_v0.ay01u8ua-cgu.6.rcgu.o : error LNK2019: unresolved external symbol __imp_output referenced in function print_something
On windows linking DLLs goes through a trampoline library (.lib file) which generates the right bindings. The convention for these is to prefix the function names with __imp__ (there is a related C++ answer).
There is an open issue that explains some of the difficulties creating and linking rust dlls under windows.
Here are the relevant bits:
If you start developing on Windows, Rust will produce a mylib.dll and mylib.dll.lib. To use this lib again from Rust you will have to specify #[link(name = "mylib.dll")], thus giving the impression that the full file name has to be specified. On Mac, however, #[link(name = "libmylib.dylib"] will fail (likewise Linux).
If you start developing on Mac and Linux, #[link(name = "mylib")] just works, giving you the impression Rust handles the name resolution (fully) automatically like other platforms that just require the base name.
In fact, the correct way to cross platform link against a dylib produced by Rust seems to be:
#[cfg_attr(all(target_os = "windows", target_env = "msvc"), link(name = "dylib.dll"))]
#[cfg_attr(not(all(target_os = "windows", target_env = "msvc")), link(name = "dylib"))]
extern "C" {}
Like your previous question you continue to ignore how compilers and linkers work. The two concepts you need to wrap your head around are these:
LPCTSTR is not a type. It is a preprocessor macro that expands to char const*, wchar_t const*, or __wchar_t const* if you are particularly unlucky. Either way, once the compiler is done, LPCTSTR is gone. Forever. It will not ever show up as a type even when using C++ name decoration.
It is not a type, don't use it in places where only types are allowed.
Compilers support different types of language linkage for external symbols. While you insist to have a C DLL, you are in fact using C++ linkage. This is evidenced by the symbol assigned to the exported function. While C++ linkage is great in that it allows type information to be encoded in the decorated names, the name decoration scheme isn't standardized in any way, and varies widely across compilers and platforms. As such, it is useless when the goal is cross language interoperability (or any interoperability).
As explained in my previous answer, you will need to get rid of the LPCTSTR in your C (or C++) interface. That's non-negotiable. It must go, and unwittingly you have done that already. Since DUMPBIN understands MSVC's C++ name decoration scheme, it was able to turn this symbol
?output##YAXPEBDZZ
into this code
void __cdecl output(char const *,...)
All type information is encoded in the decorated name, including the calling convention used. Take special note that the first formal parameter is of type char const *. That's fixed, set in stone, compiled into the DLL. There is no going back and changing your mind, so make sure your clients can't either.
You MUST change the signature of your C or C++ function. Pick either char const* or wchar_t const*. When it comes to strings in Rust on Windows there is no good option. Picking either one is the best you have.
The other issue you are going up against is insisting on having Rust come to terms with C++' language linkage. That isn't going to be an option until Standard C++ has formally standardized C++ language linkage. In statistics, this is called the "Impossible Event", so don't sink any more time into something that's not going to get you anywhere.
Instead, instruct your C or C++ library to export symbols using C language linkage by prepending an extern "C" specifier. While not formally specified either, most tools agree on a sufficiently large set of rules to be usable. Whether you like it or not, extern "C" is the only option we have when making compiled C or C++ code available to other languages (or C and C++, for that matter).
If for whatever reason you cannot use C language linkage (and frankly, since you are compiling C code I don't see a valid reason for that being the case) you could export from a DLL using a DEF file, giving you control over the names of the exported symbols. I don't see much benefit in using C++ language linkage, then throwing out all the benefits and pretend to the linker that this were C language linkage. I mean, why not just have the compiler do all that work instead?
Of course, if you are this desperately trying to avoid the solution, you could also follow the approach from your proposed answer, so long as you understand, why it works, when it stops working, and which new error mode you've introduced.
It works, in part by tricking the compiler, and in part by coincidence. The link_name = "?output##YAXPEBDZZ" attribute tells the compiler to stop massaging the import symbol and instead use the provided name when requesting the linker to resolve symbols. This works by coincidence because Rust defaults to __cdecl which happens to be the calling convention for all variadic functions in C. Most functions in the Windows API use __stdcall, though. Now ironically, had you used C linkage instead, you would have lost all type information, but retained the calling convention in the name decoration. A mismatch between calling conventions would have thus been caught during linking. Another opportunity missed, oh well.
It stops working when you recompile your C DLL and define UNICODE or _UNICODE, because now the symbol has a different name, due to different types. It will also stop working when Microsoft ever decide to change their (undocumented) name decoration scheme. And it will certainly stop working when using a different C++ compiler.
The Rust implementation introduced a new error mode. Presumably, LPCTSTR is a type alias, gated by some sort of configuration. This allows clients to select, whether they want an output that accepts a *const u8 or *const u16. The library, though, is compiled to accept char const* only. Another mismatch opportunity introduced needlessly. There is no place for generic-text mappings in Windows code, and hasn't been for decades.
As always, a few words of caution: Trying to introduce Rust into a business that's squarely footed on C and C++ requires careful consideration. Someone doing that will need to be intimately familiar with C++ compilers, linkers, and Rust. I feel that you are struggling with all three of those, and fear that you are ultimately going to provide a disservice.
Consider whether you should be bringing someone in that is sufficiently experienced. You can either thank me later for the advice, or pay me to fill in that role.
This is not my ideal answer, but it is how I solve the problem.
What I'm still looking for is a way to get the Microsoft Linker (I believe) to output full verbosity in the rust build as it can do when doing C++ builds. There are options to the build that might trigger this but I haven't found them yet. That plus this name munging in maybe 80% less text than I write here would be an ideal answer I think.
The users.rust-lang.org user chrefr helped by asking some clarifying questiongs which jogged my brain. He mentioned that "name mangling schema is unspecified in C++" which was my aha moment.
I was trying to force RUST to make the RUST linker look for my external output() API function, expecting it to look for the mangled name, as the native API call I am accessing was not declared with "cdecl" to prevent name mangling.
I simply forced RUST to use the mangled name I found with dumpbin.hex (code below)
What I was hoping for as an answer was a way to get linker.exe to output all the symbols it is looking for. Which would have been "output" which was what the compiler error was stating. I was thinking it was looking for a mangled name and wanted to compare the two mangled names by getting the microsoft linker to output what it was attempting to match.
So my solution was to use the dumpbin munged name in my #[link] directive:
//#[link(name="myNativeLib")]
//#[link(name="myNativeLib", kind="dylib")] // prepends _imp to symbol below
#[link(name="myNativeLib", kind="static")] // I'm linking with a DLL
extern {
//#[link_name = "output"]
#[link_name = "?output##YAXPEBDZZ"] // Name found via DUMPBIN.exe /Exports
fn output( format:LPCTSTR, ...);
}
Although I have access to sources of myNativeLib, these are not distributed, and not going to change. The *.lib and *.exp are only available internally, so long term I will need a solution to bind to these modules that only relys on the *.dll being present. That suggests I might need to dynamically load the DLL instead of doing what I consider "implicit" linking of the DLL. As I suspect rust is looking just at the *.lib module to resolve the symbols. I need a kind="dylibOnly" for Windows DLLS that are distributed without *.lib and *.exp modules.
But for the moment I was able to get all my link issues resolved.
I can now call my RUST DLL from a VS2019 Platform Toolset V142 "main" and the RUST DLL can call a 'C' DLL function "output" and the data goes to the propriatary stream that the native "output" function was designed to send data to.
There were several hoops involved but generally cargo/rustc/cbindgen worked well for this newbie. Now I'm trying to condsider any compute intensive task where multithreading is being avoided in 'C' that could be safely implemented in RUST which could be bencmarked to illustrate all this pain is worthwhile.
The premise: I'm writing a plug-in DLL which conforms to an industry standard interface / function signature. This will be used in at least two different software packages used internally at my company, both of which have some example skeleton code or empty shells of this particular interface. One vendor authors their example in C/C++, the other in Fortran.
Ideally I'd like to just have to write and maintain this library code in one language and not duplicate it (especially as I'm only just now getting some comfort level in various flavors of C, but haven't touched Fortran).
I've emailed off to both our vendors to see if there's anything specific their solvers need when they import this DLL, but this has made me curious at a more fundamental level. If I compile a DLL with an exposed method void foo(int bar) in both C and Fortran... by the time it's down to x86 machine instructions - does it make any difference in how that method is called by program "X"? I've gathered so far that if I were to do C++ I'd need the extern "C" bit to avoid "mangling" - there anything else I should be aware of?
It matters. The exported function must use a specific calling convention, there are several incompatible ones in common use in 32-bit code. The calling convention dictates where the function arguments are stored, in what order they are passed and how they are removed again. As well as how the function return value is passed back.
And the name of the function matters, exported function names are often decorated with extra characters. Which is what extern "C" is all about, it suppresses the name mangling that a C++ compiler uses to prevent overloaded functions from having the same exported name. So the name is one that the linker for a C compiler can recognize.
The way a C compiler makes function calls is pretty much the standard if you interop with code written in other languages. Any modern Fortran compiler will support declarations to make them compatible with a C program. And surely this is something that's already used by whatever software vendor you are working with that provides an add-on that was written in Fortran. And the other way around, as long as you provide functions that can be used by a C compiler then the Fortran programmer has a good chance at being able to call it.
Yes it has been discussed here many many times. Study answers and questions in this tag https://stackoverflow.com/questions/tagged/fortran-iso-c-binding .
The equivalent of extern "C" in fortran is bind(C). The equivalency of the datatypes is done using the intrinsic module iso_c_binding.
Also be sure to use the same calling conventions. If you do not specify anything manually, the default is usually the same for both. On Linux this is non-issue.
extern "C" is used in C++ code. So if you DLL is written in C++, you mustn't pass any C++ objects (classes).
If you stick with C types, you need to make sure the function passes parameters in a single way e.g. use C's default of _cdecl. Not sure what Fortran uses.
I know there are at least three popular methods to call the same function with multiple names. I haven't actually heard of someone using the fourth method for this purpose.
1). Could use #defines:
int my_function (int);
#define my_func my_function
OR
#define my_func(int (a)) my_function(int (a))
2). Embedded function calls are another possibility:
int my_func(int a) {
return my_function(a);
}
3). Use a weak alias in the linker:
int my_func(int a) __attribute__((weak, alias("my_function")));
4). Function pointers:
int (* const my_func)(int) = my_function;
The reason I need multiple names is for a mathematical library that has multiple implementations of the same method.
For example, I need an efficient method to calculate the square root of a scalar floating point number. So I could just use math.h's sqrt(). This is not very efficient. So I write one or two other methods, such as one using Newton's Method. The problem is each technique is better on certain processors (in my case microcontrollers). So I want the compilation process to choose the best method.
I think this means it would be best to use either the macros or the weak alias since those techniques could easily be grouped in a few #ifdef statements in the header files. This simplifies maintenance (relatively). It is also possible to do using the function pointers, but it would have to be in the source file with extern declarations of the general functions in the header file.
Which do you think is the better method?
Edit:
From the proposed solutions, there appears to be two important questions that I did not address.
Q. Are the users working primarily in C/C++?
A. All known development will be in C/C++ or assembly. I am designing this library for my own personal use, mostly for work on bare metal projects. There will be either no or minimal operating system features. There is a remote possibility of using this in full blown operating systems, which would require consideration of language bindings. Since this is for personal growth, it would be advantageous to learn library development on popular embedded operating systems.
Q. Are the users going to need/want an exposed library?
A. So far, yes. Since it is just me, I want to make direct modifications for each processor I use after testing. This is where the test suite would be useful. So an exposed library would help somewhat. Additionally, each "optimal implementation" for particular function may have a failing conditions. At this point, it has to be decided who fixes the problem: the user or the library designer. A user would need an exposed library to work around failing conditions. I am both the "user" and "library designer". It would almost be better to allow for both. Then non-realtime applications could let the library solve all of stability problems as they come up, but real-time applications would be empowered to consider algorithm speed/space vs. algorithm stability.
Another alternative would be to move the functionality into a separately compiled library optimised for each different architecture and then just link to this library during compilation. This would allow the project code to remain unchanged.
Depending on the intended audience for your library, I suggest you chose between 2 alternatives:
If the consumer of your library is guaranteed to be Cish, use #define sqrt newton_sqrt for optimal readability
If some consumers of your library are not of the C variety (think bindings to Dephi, .NET, whatever) try to avoid consumer-visible #defines. This is a major PITA for bindings, as macros are not visible on the binary - embedded function calls are the most binding-friendly.
What you can do is this. In header file (.h):
int function(void);
In the source file (.c):
static int function_implementation_a(void);
static int function_implementation_b(void);
static int function_implementation_c(void);
#if ARCH == ARCH_A
int function(void)
{
return function_implementation_a();
}
#elif ARCH == ARCH_B
int function(void)
{
return function_implementation_b();
}
#else
int function(void)
{
return function_implementation_c();
}
#endif // ARCH
Static functions called once are often inlined by the implementation. This is the case for example with gcc by default : -finline-functions-called-once is enabled even in -O0. The static functions that are not called are also usually not included in the final binary.
Note that I don't put the #if and #else in a single function body because I find the code more readable when #if directives are outside the functions body.
Note this way works better with embedded code where libraries are usually distributed in their source form.
I usually like to solve this with a single declaration in a header file with a different source file for each architecture/processor-type. Then I just have the build system (usually GNU make) choose the right source file.
I usually split the source tree into separate directories for common code and for target-specific code. For instance, my current project has a toplevel directory Project1 and underneath it are include, common, arm, and host directories. For arm and host, the Makefile looks for source in the proper directory based on the target.
I think this makes it easier to navigate the code since I don't have to look up weak symbols or preprocessor definitions to see what functions are actually getting called. It also avoids the ugliness of function wrappers and the potential performance hit of function pointers.
You might you create a test suite for all algorithms and run it on the target to determine which are the best performing, then have the test suite automatically generate the necessary linker aliases (method 3).
Beyond that a simple #define (method 1) probably the simplest, and will not and any potential overhead. It does however expose to the library user that there might be multiple implementations, which may be undesirable.
Personally, since only one implementation of each function is likley to be optimal on any specific target, I'd use the test suite to determine the required versions for each target and build a separate library for each target with only those one version of each function the correct function name directly.
One aspect where C shows its age is the encapsulation of code. Many modern languages has classes, namespaces, packages... a much more convenient to organize code than just a simple "include".
Since C is still the main language for many huge projects. How do you to overcome its limitations?
I suppose that one main factor should be lots of discipline. I would like to know what you do to handle large quantity of C code, which authors or books you can recommend.
Separate the code into functional units.
Build those units of code into individual libraries.
Use hidden symbols within libraries to reduce namespace conflicts.
Think of open source code. There is a huge amount of C code in the Linux kernel, the GNU C library, the X Window system and the Gnome desktop project. Yet, it all works together. This is because most of the code does not see any of the other code. It only communicates by well-defined interfaces. Do the same in any large project.
Some people don't like it but I am an advocate of organzing my structs and associated functions together as if they are a class where the this pointer is passed explicitly. For instance, combined with a consistent naming convention to make the namespace explicit. A header would be something like:
typedef struct foo {
int x;
double y;
} FOO_T
FOO_T * foo_new();
int foo_set_x(FOO_T * self, int arg1);
int foo_do_bar(FOO_T * self, int arg1);
FOO_T * foo_delete(FOO_T * self);
In the implementation, all the "private" functions would be static. The downside of this is that you can't actually enforce that the user not go and muck with the members of the struct. That's just life in c. I find this style though makes for nicely reusable C types.
A good way you can achieve some encapsulation is to declare internal methods or variables of a module as static
As Andres says, static is your friend. But speaking of friends... if you want to be able to separate a library in two files, then some symbols from one file that need to be seen in the other can not be static.
Decide of some naming conventions: all non-static symbols from library foo start with foo_. And make sure they are always followed: it is precisely the symbols for which it seems constraining ("I need to call it foo_max?! But it is just max!") that there will be clashes.
As Zan says, a typical Linux distribution can be seen as a huge project written mostly in C. It works. There are interfaces, and large-subprojects are implemented as separate processes. An implementation in separate processes helps for debugging, for testing, for code reuse, and it provides a second hierarchy in addition to the only one that exists at link level. When your project becomes large enough, it may start to make sense to put some of the functionalities in separate processes. Something already as specialized as a C compiler is typically implemented as three processes: pre-processor, compiler, assembler.
If you can control the project (e.g. in-house or you pay someone else to do it) you can simply set rules and use reviews and tools to enforce them. There is no real need for the language to do this, you can for instance demand that all functions usable outside a module (=a set of files, don't even need to be a separate) must be marked thus. In effect, you would force the developers to think about the interfaces and stick with them.
If you really want to make the point, you could define macros to show this as well, e.g.
#define PUBLIC
#define PRIVATE static
or some such.
So you are right, discipline is the key here. It involves setting the rules AND making sure that they are followed.
How one can achieve late binding in C language ?
Late binding is not really a function of the C language itself, more something that your execution environment provides for you.
Many systems will provide deferred binding as a feature of the linker/loader and you can also use explicit calls such as dlopen (to open a shared library) and dlsym (to get the address of a symbol within that library so you can access it or call it).
The only semi-portable way of getting late binding with the C standard would be to use some trickery with system() and even that is at least partially implementation-specific.
If you're not so much talking about deferred binding but instead polymorphism, you can achieve that effect with function pointers. Basically, you create a struct which has all the data for a type along with function pointers for locating the methods for that type. Then, in the "constructor" (typically an init() function), you set the function pointers to the relevant functions for that type.
You still need to include all the code even if you don't use it but it is possible to get polymorphism that way.
Symbol binding in C is always done at compile time, never runtime.
Library binding, or dynamic linking as it's called, is done via dlopen() and dlsym() on *nix, and LoadLibrary() and GetProcAddress() on Windows.
How one can achieve late binding in C language ?
The closest would be through dynamic loading of library (DLL) such as with dlopen & dlsym on Linux. Otherwise, it is not directly available in C.
Use Objective-C or Lua. Both are late-bound languages which can easily interface with C.
Of course you could implement your own name resolution scheme, but why re-invent the wheel?
Unfortunately you did not specify an OS. For Unix you can use shared libraries, or create a configurable (plugin) module structure. For details you may find the source code of an apache 1.3 webserver useful. http://httpd.apache.org/download.cgi
cppdev seems to be the one and only to hit the spot with his/her remark.
Please, have a look at the definition itself. In a few words:
Late binding, or dynamic binding, is a computer programming mechanism
in which the method being called upon an object is looked up by name
at runtime.
All other answers just miss the main point, that is "look up by name".
The needed solution would be very similar to a lookup table of pointers to functions along with a function or two to select the right one by name (or even by signature). We call it a "hash table".