cast pointer to functor, and call it - c

can I do something like:
typedef void (*functor)(void* param);
//machine code of function
char functionBody[] = {
0xff,0x43,0xBC,0xC0,0xDE,....
}
//cast pointer to function
functor myFunc = (functor)functionBody;
//call to functor
myFunc(param);

Formally, the C language doesn't allow conversions between function pointers and object pointers, so this can't be done.
However, many C implementations - perhaps even "most" - support this as an extension. Whether it works or not depends on things like memory permissions and cache coherency, which will change depending on your architecture and operating system.

Depends on protection ring.
on most popular desktop platforms, you will not be able to execute code in data segment because of page privileges.

On modern operating systems, it depends on whether the memory is marked executable or not. On POSIX systems, it may be possible to obtain executable memory using mmap. Keep in mind that even on a given cpu architecture, calling convention may vary. For example if the caller expects the callee to clear arguments off the stack, your code had better do that or it will crash on return. (Normally, C ABIs don't make this stupid requirement, but it's something to think about.)
Rather than trying to call your machine code directly as a C function pointer, it may be better to write an inline asm wrapper that calls it. This way, you have control over the calling convention.

It could work. Possibly using const char[] for the machine instructions is even better, because on many platforms static const storage is placed in the read-only program memory section.

Related

ABI of functions in system libraries

I'm generating machine code to call functions from existing system libraries. Most system libraries were written in C, so I'll take C as an example, but the question probably applies to any other language.
If I understand this answer correctly, C compilers are free to choose the ABI/calling convention of a function as long as they preserve the semantics. For instance they can choose to pass a pointer for the returned value as an argument to obtain copy-elision.
Does this mean that no one can ever truly know what's the right way to call a function from a library, even if its C signature is known?
Is this a real concern in practice? Or is it safe to assume that all the functions with non-mangled names from system libraries always use the system's default calling convention?
What other assumptions or considerations can I make about the ABI/calling convention of functions with non-mangled names in system libraries?
C compilers are free to choose the ABI/calling convention of a function as long as they preserve the semantics.
Well, yes and no. The ABI is often defined by the target system, in which case the compiler has to fall in line. In case there exists no ABI for the target system (often the case in microcontroller programming), the compiler is free to do as it pleases, essentially inventing the ABI.
Does this mean that no one can ever truly know what's the right way to call a function from a library, even if its C signature is known?
No you can't unless you know the target system and calling convention. Some systems have several "de facto" standards such as x86 Windows __cdecl vs __stdcall see https://en.wikipedia.org/wiki/X86_calling_conventions
Is this a real concern in practice?
Not within a program written entirely in C. But it becomes a big problem in case the program links external libs such as Windows DLLs, possibly written in other languages. Then you have to use the right calling convention or the program will soon crash.
It's also a very real concern whenever you attempt to mix assembler and C for the given system - the C compiler will handle stacking according to the calling convention, but in the assembler part you have to write this manually. This can also affect the C code, if it is written with care to suit assembler. You'd then pick parameter and return types that are convenient to use.
If I understand this answer correctly, C compilers are free to choose
the ABI/calling convention of a function as long as they preserve the
semantics. For instance they can choose to pass a pointer for the
returned value as an argument to obtain copy-elision.
I don't see how you conclude that from the answer you referenced. Calling conventions are a characteristic of the function, as it appears in compiled form. The compiler can do all manner of tricks at the point of call, but changing or ignoring the calling conventions of the function implementation is not one of them. Where it is possible, copy elision for returned structure values (the subject of that answer) does not rely on any such thing.
Does this mean that no one can ever truly know what's the right way to
call a function from a library, even if its C signature is known?
Yes and no. The function signature alone does not convey anything about calling convention (with some caveats; see below), but libraries simply could not work if there were no way to know calling conventions. In practice, it is usually the case that calling convention (and ABI overall) is standardized on a per-platform basis.
Thus, for example, Linux implementations for x86_64 substantially all follow the same conventions. All the toolchains targeting that platform both use that convention for function calls and provide for functions to be called according to it. Compilers for Win64 likewise follow the appropriate (different) conventions.
Windows is in fact an interesting case, however, because historically, it has supported multiple calling conventions. In its case, there is a default convention, and different conventions can be specified in function declarations via extension keywords. The compiler knows which convention to use based on the function declaration.
Additionally, where it is not concerned about interoperability, compilers can do anything within their power. So, for example, when compiling a function with internal linkage, it could, in principle, use whatever calling convention it wants, as it is in full control of both the function and all callers (ignoring the possible effect of function pointers). This is not different in kind from compilers' ability to inline functions. As a practical matter, however, I would not expect compilers to use variant calling conventions under such circumstances, and I am not aware of any that do.
Is this a real concern in practice? Or is it safe to assume that all
the functions with non-mangled names from system libraries always use
the system's default calling convention?
Name mangling has nothing to do with it. That's part of a higher-level mapping of C++ (usually) semantics onto system-level, source-language-independent object-file formats.
Generally speaking, it is safe to assume that where the appropriate function declarations are in scope (from the library's header files, typically), the compiler will generate correct calls. This is an essential interoperability characteristic that is rarely violated in practice. It cannot be construed as a universal guarantee, but in practice, it is not something that you should worry about.
What other assumptions or considerations can I make about the
ABI/calling convention of functions with non-mangled names in system
libraries?
I'm unsure what kinds of assumptions you have in mind, and I suspect you're overcomplicating things. You make sure to include the header(s) from the relevant library that declare the functions you want to call. Having done so, you rely on your compiler to generate correct calls.

Why calling conventions aren't used in all C programs

I am new to programming and while reading Charles Petzold book Programming Windows, I stumbled upon WINAPI (actually was surprised by the presence of another word before a function's name besides the return type) and found that it is a calling convention and to the best of my understanding it is a way of how a function pushes variables on the stack and gets the return value, I wondered why we do not use them in every C programs? Are they just exclusive to OS programming?
Calling conventions are typically tied to compiler, architecture and (when it comes to using system runtime libraries) the OS; they're not part of the C standard at all. In most cases, there's only one calling convention for a given architecture/compiler/OS combo, so you don't need to think about it; it just uses the only convention that OS supports.
The one place where it has mattered a lot in recent history was on 32 bit x86 systems, particularly on Windows. x86 had very few general purpose registers, so only a few were available at all, less than what a typical function might need for its arguments, and using them for argument passing meant you often needed to push whatever they used to contain to the stack, so there were a lot of trade-offs involved in calling conventions (it's faster to pass arguments in registers, but only if the caller could spare the registers or at least not be forced into excessive spilling to stack), and Windows went with "we'll use 'em all in different scenarios".
In modern usage on x86-64 (which is far less register starved) and on non-x86 architectures (which usually had enough registers), most compilers/OSes stick with a single common calling convention, so again, you don't need to pay attention. It's a curiosity, not something you need to personally pay attention to unless you're hand-writing whole functions in assembly.
We DO use calling conventions in all C programs, but they are typically defaulted in the compiler settings and so do not have to be expressed explicitly in code, unless actually necessary (library interactions, etc).
Calling conventions are not part of the C language itself, but are handled by compiler vendors as extensions to the language.
Most C compilers typically default to the __cdecl calling convention, but this can be changed by the compiler user if needed. Once upon a time, Windows APIs used to use __pascal, but for a very long time now __stdcall is being used instead. Hence the existence of the WINAPI preprocessor macro so Microsoft could switch between them without requiring most existing code to be rewritten.

What is the calling convention that clang uses?

What is the default call convention that the clang compiler uses? I noticed that when I return a local pointer, the reference is not lost
#include <stdio.h>
char *retx(void) {
char buf[4] = "buf";
return buf;
}
int main(void) {
char *p1 = retx();
puts(p1);
return 0;
}
This is Undefined Behaviour. It might happen to work, or it might not, depending on what the compiler happened to choose when compiling for some specific target. It's literally undefined, not "guaranteed to break"; that's the entire point. Compilers can just completely ignore the possibility of UB when generating code, not using extra instructions to make sure UB breaks. (If you want that, compile with -fsanitize=undefined).
Understanding exactly what happened requires looking at the asm, not just trying running it.
warning: address of stack memory associated with local variable 'buf' returned [-Wreturn-stack-address]
return buf;
^~~
Clang prints this warning even without -Wall enabled. Exactly because it's not legal C, regardless of what asm calling convention you're targeting.
Clang uses the C calling convention of the target it's compiling for1. Different OSes on the same ISA can have different conventions, although outside of x86 most ISAs only have one major calling convention. x86 has been around so long that the original calling conventions (stack args with no register args) were inefficient so various 32-bit conventions evolved. And Microsoft chose a different 64-bit convention from everyone else. So there's x86-64 System V, Windows x64, i386 System V for 32-bit x86, AArch64's standard convention, PowerPC's standard convention, etc. etc.
I have tested with clang several times and every time I displayed the string
The "decision" / "luck" of whether it "works" or not is made at compile time, not runtime. Compiling / running the same source multiple times with the same compiler tells you nothing.
Look at the generated asm to find out where char buf[4] ends up.
My guess: maybe you're on Windows x64. Happening to work is more plausible there than most calling conventions, where you'd expect buf[4] to end up below the stack pointer in main, so the call to puts, and puts itself, would be very likely to overwrite it.
If you're on Windows x64 compiling with optimization disabled, retx()'s local char buf[4] might be placed in the shadow space it owns. The caller then calls puts() with the same stack alignment, so retx's shadow space becomes puts's shadow space.
And if puts happens not to write its shadow space, then the data in memory that retx stored is still there. e.g. maybe puts is a wrapper function that in turn calls another function, without initializing a bunch of locals for itself first. But not a tailcall, so it allocates new shadow space.
(But that's not what clang8.0 does in practice with optimization disabled. It looks like buf[4] will be placed below RSP and get stepped on there, using __attribute__((ms_abi)) to get Windows x64 code-gen from Linux clang: https://godbolt.org/z/2VszYg)
But it's also possible in stack-args conventions where padding is left to align the stack pointer by 16 before a call. (e.g. modern i386 System V on Linux for 32-bit x86). puts() has an arg but retx() doesn't, so maybe buf[4] ended up in memory that the caller "allocates" as padding before pushing a pointer arg for puts.
Of course that would be unsafe because the data would be temporarily below the stack pointer, in a calling convention with no red-zone. (Only a few ABIs / calling conventions have red zones: memory below the stack pointer that's guaranteed not to be clobbered asynchronously by signal handlers, exception handlers, or debuggers calling functions in the target process.)
I wondered if enabling optimization would make it inline and happen to work. But no, I tested that for Windows x64: https://godbolt.org/z/k3xGe4. clang and MSVC both optimize away any stores of "buf\0" into memory. Instead they just pass puts a pointer to some uninitialized stack memory.
Code that breaks with optimization enabled is almost always UB.
Footnote 1: Except for x86-64 System V, where clang uses an extra un-documented "feature" of the calling convention: Narrow integer types as function args in registers are assumed to be sign-extended to 32 bits. gcc and clang both do this when calling, but ICC does not, so calling clang functions from ICC-compiled code can cause breakage. See Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
Annex L of the C11 Draft N1570 recognizes some situations (i.e. "non-critical Undefined Behavior") where the Standard imposes no particular behavioral requirements but implementations that define __STDC_ANALYZABLE__ with a non-zero value should offer some guarantees, and other situations ("critical Undefined Behavior") where it would be common for implementations not to guarantee anything. Attempts to access objects past their lifetime would fall into the latter category.
While nothing would prevent an implementation from offering behavioral guarantees beyond what the Standard requires, even for Critical Undefined Behavior, and some tasks would require that implementations do so (e.g. many embedded systems tasks require that programs dereference pointers to addresses whose targets no not satisfy the definition for "objects"), accessing automatic variables past their lifetime is a behavior about which few implementations would offer any guarantees beyond perhaps guaranteeing that reading an arbitrary RAM address will have no side-effects beyond yielding an Unspecified value.
Even implementations that guaranteed how automatic objects will be laid out on the stack seldom guaranteed that the storage that held them wouldn't be overwritten between the time a function returned and the next action by the caller. Unless interrupts were disabled, interrupt handling could overwrite any storage used that had been used by automatic objects that were no longer in a live stack frame.
While many implementations can be configured to offer useful guarantees about the behavior of actions for which the Standard imposes no requirements, I can't think of any implementations that can be configured to offer sufficient guarantees to make the above code usable.

Is it possible to instruct C to not zero-initialize global arrays?

I'm writing an embedded application and almost all of my RAM is used by global byte-arrays. When my firmware boots it starts by overwriting the whole BSS section in RAM with zeroes, which is completely unnecessary in my case.
Is there some way I can instruct the compiler that it doesn't need to zero-initialize certain arrays? I know this can also be solved by declaring them as pointers, and using malloc(), but there are several reasons I want to avoid that.
The problem is that standard C enforces zero initialization of static objects. If the compiler skips it, it wouldn't conform to the C standard.
On embedded systems compilers there is usually a non-standard option "compact startup" or similar. When enabled, no initialization of static/global objects will occur at all, anywhere in the program. How to do this depends on your compiler, or in this case, on your gcc port.
If you mention which system you are using, someone might be able to provide a solution for that particular compiler port.
This means that any static/global (static storage duration) variable that you initialize explicitly will no longer be initialized. You will have to initialize it in runtime, that is, instead of static int x=1; you will have to write static int x; x=1;. It is rather common to write embedded C programs in this manner, to make them compatible with compilers where the static initialization is disabled.
It turned out that the linker-script included in my toolchain has a special "noinit" section.
__attribute__ ((section (".noinit")))
/** Forces the compiler to not automatically zero the given global
variable on startup, so that the current RAM contents is retained.
Under most conditions this value will be random due to the
behaviour of volatile memory once power is removed, but may be used in some specific
circumstances, like the passing of values back after a system watchdog reset.
So all global variabeles marked with that attribute will not be zero-initialised during boot.
The C standard REQUIRES global data to be initialized to zero.
It is possible that SOME embedded system manufacturers provide a way to bypass this option, but there are certainly many typical applications that would simply fail if the "initialize to zero" wasn't done.
Some compilers also allow you to have further sections, which may have other characteristics than the 'bss' section.
The other alternative is of course to "make your own allocation". Since it's an embedded system, I suppose you have control over how the application and data is loaded into RAM, in particular, what addresses are used for that.
So, you could use a pointer, and simply use your own mechanism for assigning the pointer to a memory region that is reserved for whatever you need large arrays for. This avoids the rather complex usage of malloc - and it gives you a more or less permanent address, so you don't have to worry about trying to find where your data is later on. This will of course have a small effect on performance, since it adds another level of indirection, but in most cases, that disappears as soon as the array is used as an argument to a function, as it decays to a pointer at that point anyways.
There are a few workarounds like:
Deleting the BSS section from the binary or setting its size to 0 or 1. This will not work if the loader must explicitly allocate memory for all sections. This will work if the loader simply copies data to the RAM.
Declaring your arrays as extern in C code and defining the symbols (along with their addresses) either in assembly code in separate assembly files or in the linker script. Again, if memory must be explicitly allocated, this won't work.
Patching or removing the relevant BSS-zeroing code either in the loader or in the startup code that is executed in your program before main().
All embedded compilers should allow a noinit segment. With the IAR AVR compiler the variables you don't want to be initialised are simply declared as follows:
__no_init uint16_t foo;
The most useful reason for this is to allow variables to maintain their values over a watchdog or brown-out reset, which of course doesn't happen in computer-based C programs, hence its omission from standard C.
Just search you compiler manual for "noinit" or something similar.
Are you sure the binary format actually includes a BSS section in the binary? In the binary formats I've worked with BSS is simply a integer that tells the kernel/loader how much memory to allocate and zero out.
There definitely is no general way in C to get uninitialized global variables. This would be a function of your compiler/linker/runtime system and highly specific to that.
with gcc, -fno-zero-initialized-in-bss

Is ARPACK thread-safe?

Is it safe to use the ARPACK eigensolver from different threads at the same time from a program written in C? Or, if ARPACK itself is not thread-safe, is there an API-compatible thread-safe implementation out there? A quick Google search didn't turn up anything useful, but given the fact that ARPACK is used heavily in large scientific calculations, I'd find it highly surprising to be the first one who needs a thread-safe sparse eigensolver.
I'm not too familiar with Fortran, so I translated the ARPACK source code to C using f2c, and it seems that there are quite a few static variables. Basically, all the local variables in the translated routines seem to be static, implying that the library itself is not thread-safe.
Fortran 77 does not support recursion, and hence a standard conforming compiler can allocate all variables in the data section of the program; in principle, neither a stack nor a heap is needed [1].
It might be that this is what f2c is doing, and if so, it might be that it's the f2c step that makes the program non thread-safe, rather than the program itself. Of course, as others have mentioned, check out for COMMON blocks as well. EDIT: Also, check for explicit SAVE directives. SAVE means that the value of the variable should be retained between subsequent invocations of the procedure, similar to static in C. Now, allocating all procedure local data in the data section makes all variables implicitly SAVE, and unfortunately, there is a lot of old code that assumes this even though it's not guaranteed by the Fortran standard. Such code, obviously, is not thread-safe. Wrt. ARPACK specifically, I can't promise anything but ARPACK is generally well regarded and widely used so I'd be surprised if it suffered from these kinds of dusty-deck problems.
Most modern Fortran compilers do use stack allocation. You might have better luck compiling ARPACK with, say, gfortran and the -frecursive option.
EDIT:
[1] Not because it's more efficient, but because Fortran was originally designed before stacks and heaps were invented, and for some reason the standards committee wanted to retain the option to implement Fortran on hardware with neither stack nor heap support all the way up to Fortran 90. Actually, I'd guess that stacks are more efficient on todays heavily cache-dependent hardware rather than accessing procedure local data that is spread all over the data section.
I have converted ARPACK to C using f2c. Whenever you use f2c and you care about thread-safety you must use the -a switch. This makes local variables have automatic storage, i.e. be stack based locals rather than statics which is the default.
Even so, ARPACK itself is decidedly not threadsafe. It uses a lot of common blocks (i.e. global variables) to preserve state between different calls to its functions. If memory serves, it uses a reverse communication interface which tends to lead developers to using global variables. And of course ARPACK probably was written long before multi-threading was common.
I ended up re-working the converted C code to systematically remove all the global variables. I created a handful of C structs and gradually moved the global variables into these structs. Finally I passed pointers to these structs to each function that needed access to those variables. Although I could just have converted each global into a parameter wherever it was needed it was much cleaner to keep them all together, contained in structs.
Essentially the idea is to convert global variables into local variables.
ARPACK uses BLAC right? Then those libraries need to be thread safe too.
I believe your idea to check with f2c might not be a bullet proof way of telling if the Fortran code is thread safe, I would guess it also depends on the Fortran compiler and libraries.
I don't know what strategy f2c uses in translating Fortran. Since ARPACK is written in FORTRAN 77, the first thing to do is check for the presence of COMMON blocks. These are global variables, and if used, the code is most likely not thread safe. The ARPACK webpage, http://www.caam.rice.edu/software/ARPACK/, says that there is a parallel version -- it seems likely that that version is threadsafe.

Resources