How to detect the language that lib caller is using? - c

I am writing a static library by C++, expecting it to be used by either Fortran or C. Since Fortran has all its index starting from 1, I have to do some index modification inside my library when called by Fortran. Because I am passing an array of indices to the library and it is important for further computation.
Of course an intuitive way to solve this problem is to set a argument at interface to let user tell me what language they are using, but I don't think it is a cool way to do this.
So I wonder if there is anyway to detect in my library if it is called by Fortran or C?
Thanks!

If you are just passing arrays and their lengths, there shouldn't be any issue. The problem is only if you pass an index, then you need to know what that index is relative to. (Which in Fortran could be any value, if the array is explicitly declared to start with an index other than one). If you have this case, my suggestion is to write glue routines for one of the languages that will convert the index values, then call the regular library routines. The problem with this solution is that it obligates the user of the "special" language to call the special glue routines; calling the regular routines is a mistake.

Applications and libraries in any language will be built to target the same ABI. The ABI defines calling conventions and other details that make it possible for two functions built by different compilers (possibly for different languages) to call each other. There shouldn't be anything obviously different in the calls because great effort has gone into avoiding those differences.
You could look for out-of-band information like symbols provided by the FORTRAN compiler (or pulled in from its utility libraries). You would declare some symbol with the weak attribute and if it became valid you would know that there was some FORTRAN somewhere in the current executable image. However, you could not know if it was calling you directly or simply due to some other library pulled in.
The right solution appears to be using explicit wrappers to call C from FORTRAN: Calling a FORTRAN subroutine from C

It would be better if you set a start index. Fortran arrays do not necessarily have to start at 1. They can start at any number. The array may have been declared as
float, dimension(-20:20):: neg
float, dimension(4:99) pos
So passing an index of 5 for neg would mean the 26th element and passing an index of 5 for pos would mean the 2nd element.

Related

Why aren't binaries of different languages compatible with each other? How do you make them compatible?

A swift app, will convert its dynamic frameworks into binaries. And once something is a binary, then it's no longer Swift/Ruby/Python, etc. It's machine code.
Same thing happens for a Python binary. So why aren't the machine codes compatible with each other out of the box?
Is it just that a simple mapping is required to bridge one language to the other?
Like if I needed to use a binary created from the Swift language — into a Python based app, then do I need to expose the Swift Headers to Python for it to work? Or something else is required?
I assume you're talking about making calls in one language to a library compiled in a different language.
At the assembly language level, there are standards (ABI, for Application Binary Interface) that define how function parameters are passed in registers, how values are returned, the behavior of the stack, etc. ABIs are architecture and operating-system-dependent. Usually any function that is exported in a library will follow the ABI.
It is plain that ABIs basically expect a C language model for functions: a single return value, a well-defined data type for each function parameter as well as the return value, the possibility of using pointers, etc.
Problems start to arise once you move to a higher level language. C++ already introduces complications: whereas the name of a C function is the same in assembly (often a _ character is prepended), in C++ function names must encode data types due to the possibility of overloaded functions with the same name but different parameters. Thus, names must be mangled and demangled -- this is why a prototype for a C function must be declared as extern "C" in C++. Then there are issues of the classes (this pointer, vtables), namespaces and so on, which complicate matters further.
Then you have dynamically typed languages like Python. In truth, there is no such thing as dynamic typing at the assembly language levels: the instruction encodings in machine language (i.e. binary codes as they're read by the CPU when executed) implicitly determine whether you're using an integer or floating-point or SIMD instruction (and the width of operands), which also determines which of the different register banks are accessed. Although the language makes dynamic typing transparent to you, at the assembly code level, the interpreter/JIT/compiler must resolve them somehow, because ultimately the CPU must be told exactly what data type to operate on.
This is why you can't directly call a C function (or in general any library function) from Python -- unlike a pure Python function which can disregard the types of its parameters, library functions must know the exact types of each parameter and the return type. Thus, you must use something like ctypes for Python, explicitly specifying the types in question for each function that needs to be called -- in a way, this is similar to function prototypes usually found in C headers. It is possible to write functions in C that are directly callable from Python (and, in that case, essentially from Python alone), but you'll have to jump through a few hoops.
As for the particular language pairing you're interested in (Python/Swift), a cursory search came up with this thread in the Swift forums (this one, linked from there, may also be interesting. Reading the thread, there appears to be two feasible solutions at this time: first, use the #_cdecl attribute (which isn't officially supported) to make a C function, and then call it from Python using ctypes. But the second and apparently more promising one is to use the #objc attribute in Swift, and use PyObjC in Python. I assume this will allow using some of the higher-level features of Swift, at least those that intersect with what Objective-C offers.

Efficient calling of F95 in R: use .Fortran or .Call?

I am writing some R simulation code, but want to leverage Fortran's swift Linear Algebra libraries to replace the core iteration loops. So far I was looking primarily at the obvious option of using .Fortran to call linked F95 subroutines; I figured I should optimize memory use (I am passing very large arrays) and set DUP=FALSE but then I read the warning in manual about the dangers of this approach and of its depreciation in R 3.1.0 and disablement in R 3.2.0. Now the manual recommends switching to .Call, but this function offers no Fortran support natively.
My googling has yielded a stackoverflow question which explores an approach of linking the Fortran subroutine through C code and the calling it using .Call. This seems to me the kind of thing that could either work like a charm or a curse. Hence, my questions:
Aiming for speed and robustness, what are the risks and benefits of calling Fortran via .Fortran and via .Call?
Is there a more elegant/efficient way of using .Call to call Fortran subroutines?
Is there another option altogether?
Here's my thoughts on the situation:
.Call is the generally preferred interface. It provides you a pointer to the underlying R data object (a SEXP) directly, and so all of the memory management is then your decision to make. You can respect the NAMED field and duplicate the data if you want, or ignore it (if you know that you won't be modifying the data in place, or feel comfortable doing that for some other reason)
.Fortran tries to automagically provide the appropriate data types from an R SEXP object to a Fortran subroutine; however, its use is generally discouraged (for reasons not entirely clear to me, to be honest)
You should have some luck calling compiled Fortran code from C / C++ routines. Given a Fortran subroutine called fortran_subroutine, you should be able to provide a forward declaration in your C / C++ code as e.g. (note: you'll need a leading extern "C" for C++ code):
void fortran_subroutine_(<args>);
Note the trailing underscore on the function name -- this is just how Fortran compilers (that I am familiar with, e.g. gfortran) 'mangle' symbol names by default, and so the symbol that's made available will have that trailing underscore.
In addition, you'll need to make sure the <args> you choose map to from the corresponding C types to the corresponding Fortran types. Fortunately, R-exts provides such a table.
In the end, R's R CMD build would automatically facilitate the compilation + linking process for an R package. Because I am evidently a glutton for punishment, I've produced an example package which should provide enough information for you to get a sense of how the bindings work there.
3. Is there another option altogether? YES
The R package dotCall64 could be an interesting alternative. It provides .C64() which is an enhanced version of the Foreign Function Interface, i.e., .C() and .Fortran().
The interface .C64() can be used to interface both Fortran and C/C++ code. It
has a similar usage as .C() and .Fortran()
provides a mechanism to avoid unnecessary copies of read-only and write-only arguments
supports long vectors (vectors with more then 2^31-1 elements)
supports 64-bit integer type arguments
Hence, one can avoid unnecessary copies of read-only arguments while avoiding the .Call() interface in combination with a C wrapper function.
Some links:
The R package dotCall64: https://CRAN.R-project.org/package=dotCall64
Description of dotCall64 with examples: https://doi.org/10.1016/j.softx.2018.06.002
An illustration where dotCall64 is used to make the sparse matrix algebra R package spam compatible with huge sparse matrices (more than 2^31-1 non-zero elements): https://doi.org/10.1016/j.cageo.2016.11.015
I am one of the authors of dotCall64 and spam.

Should a Fortran-compiled and C-compiled DLL be able to import interchangeably? (x86 target)

The premise: I'm writing a plug-in DLL which conforms to an industry standard interface / function signature. This will be used in at least two different software packages used internally at my company, both of which have some example skeleton code or empty shells of this particular interface. One vendor authors their example in C/C++, the other in Fortran.
Ideally I'd like to just have to write and maintain this library code in one language and not duplicate it (especially as I'm only just now getting some comfort level in various flavors of C, but haven't touched Fortran).
I've emailed off to both our vendors to see if there's anything specific their solvers need when they import this DLL, but this has made me curious at a more fundamental level. If I compile a DLL with an exposed method void foo(int bar) in both C and Fortran... by the time it's down to x86 machine instructions - does it make any difference in how that method is called by program "X"? I've gathered so far that if I were to do C++ I'd need the extern "C" bit to avoid "mangling" - there anything else I should be aware of?
It matters. The exported function must use a specific calling convention, there are several incompatible ones in common use in 32-bit code. The calling convention dictates where the function arguments are stored, in what order they are passed and how they are removed again. As well as how the function return value is passed back.
And the name of the function matters, exported function names are often decorated with extra characters. Which is what extern "C" is all about, it suppresses the name mangling that a C++ compiler uses to prevent overloaded functions from having the same exported name. So the name is one that the linker for a C compiler can recognize.
The way a C compiler makes function calls is pretty much the standard if you interop with code written in other languages. Any modern Fortran compiler will support declarations to make them compatible with a C program. And surely this is something that's already used by whatever software vendor you are working with that provides an add-on that was written in Fortran. And the other way around, as long as you provide functions that can be used by a C compiler then the Fortran programmer has a good chance at being able to call it.
Yes it has been discussed here many many times. Study answers and questions in this tag https://stackoverflow.com/questions/tagged/fortran-iso-c-binding .
The equivalent of extern "C" in fortran is bind(C). The equivalency of the datatypes is done using the intrinsic module iso_c_binding.
Also be sure to use the same calling conventions. If you do not specify anything manually, the default is usually the same for both. On Linux this is non-issue.
extern "C" is used in C++ code. So if you DLL is written in C++, you mustn't pass any C++ objects (classes).
If you stick with C types, you need to make sure the function passes parameters in a single way e.g. use C's default of _cdecl. Not sure what Fortran uses.

Is it possible to LD_PRELOAD a function with different parameters?

Say I replace a function by creating a shared object and using LD_PRELOAD to load it first. Is it possible to have parameters to that function different from the one in original library?
For example, if I replace pthread_mutex_lock, such that instead of parameter pthread_mutex_t it takes pthread_my_mutex_t. Is it possible?
Secondly, besides function, is it possible to change structure declarations using LD_PRELOAD? For example, one may add one more field to a structure.
Although you can arrange to provide your modified pthread_mutex_lock() function, the code will have been compiled to call the standard function. This will lead to problems when the replacement is called with the parameters passed to the standard function. This is a polite way of saying:
Expect it to crash and burn
Any pre-loaded function must implement the same interface — same name, same arguments in, same values out — as the function it replaces. The internals can be implemented as differently as you need, but the interface must be the same.
Similarly with structures. The existing code was compiled to expect one size for the structure, with one specific layout. You might get away with adding an extra field at the end, but the non-substituted code will probably not work correctly. It will allocate space for the original size of structure, not the enhanced structure, etc. It will never access the extra element itself. It probably isn't quite impossible, but you must have designed the program to handle dynamically changing structure sizes, which places severe enough constraints on when you can do it that the answer "you can't" is probably apposite (and is certainly much simpler).
IMNSHO, the LD_PRELOAD mechanism is for dire emergencies (and is a temporary band-aid for a given problem). It is not a mechanism you should plan to use on anything remotely resembling a regular basis.
LD_PRELOAD does one thing, and one thing only. It arranges for a particular DSO file to be at the front of the list that ld.so uses to look up symbols. It has nothing to do with how the code uses a function or data item once found.
Anything you can do with LD_PRELOAD, you can simulate by just linking the replacement library with -l at the front of the list. If, on the other hand, you can't accomplish a task with that -l, you can't do it with LD_PRELOAD.
The effects of what you're describing are conceptually the same as the effects of providing a mismatching external function at normal link time: undefined behavior.
If you want to do this, rather than playing with fire, why don't you make your replacement function also take pthread_mutex_t * as its argument type, and then just convert the pointer to pthread_my_mutex_t * in the function body? Normally this conversion will take place only at the source level anyway; no code should be generated for it.

How to implement standard C function extraction?

I have a "a pain in the a$$" task to extract/parse all standard C functions that were called in the main() function. Ex: printf, fseek, etc...
Currently, my only plan is to read each line inside the main() and search if a standard C functions exists by checking the list of standard C functions that I will also be defining (#define CFUNCTIONS "printf...")
As you know there are so many standard C functions, so defining all of them will be so annoying.
Any idea on how can I check if a string is a standard C functions?
If you have heard of cscope, try looking into the database it generates. There are instructions available at the cscope front end to list out all the functions that a given function has called.
If you look at the list of the calls from main(), you should be able to narrow down your work considerably.
If you have to parse by hand, I suggest starting with the included standard headers. They should give you a decent idea about which functions could you expect to see in main().
Either way, the work sounds non-trivial and interesting.
Parsing C source code seems simple at first blush, but as others have pointed out, the possibility of a programmer getting far off the leash by using #defines and #includes is rather common. Unless it is known that the specific program to be parsed is mild-mannered with respect to text substitution, the complexity of parsing arbitrary C source code is considerable.
Consider the less used, but far more effective tactic of parsing the object module. Compile the source module, but do not link it. To further simplify, reprocess the file containing main to remove all other functions, but leave declarations in their places.
Depending on the requirements, there are two ways to complete the task:
Write a program which opens the object module and iterates through the external reference symbol table. If the symbol matches one of the interesting function names, list it. Many platforms have library functions for parsing an object module.
Write a command file or script which uses the developer tools to examine object modules. For example, on Linux, the command nm lists external references with a U.
The task may look simple at first but in order to be really 100% sure you would need to parse the C-file. It is not sufficient to just look for the name, you need to know the context as well i.e. when to check the id, first when you have determined that the id is a function you can check if it is a standard c-runtime function.
(plus I guess it makes the task more interesting :-)
I don't think there's any way around having to define a list of standard C functions to accomplish your task. But it's even more annoying than that -- consider macros,
for example:
#define OUTPUT(foo) printf("%s\n",foo)
main()
{
OUTPUT("Ha ha!\n");
}
So you'll probably want to run your code through the preprocessor before checking
which functions are called from main(). Then you might have cases like this:
some_func("This might look like a call to fclose(fp), but surprise!\n");
So you'll probably need a full-blown parser to do this rigorously, since string literals
may span multiple lines.
I won't bring up trigraphs...that would just be pointless sadism. :-) Anyway, good luck, and happy coding!

Resources