Forcing 64-bit long doubles? - c

I'm building musl-libc statically for a project on an aarch64 (ARM 64-bit) platform. I'd like to avoid any soft floating point libraries such as GCC's soft float library routines. However, these are still appearing in the library archives even when I use -mfloat-abi=hard. As best I can tell, this is because ARM 64-bit platforms define a long double to be 128 bits.
Is there any way to change this behavior? For example, could I force long double to be defined as the same size as a double? I know this is allowed by the C standard, but I'm unsure if there's any way to force Clang (I'm specifically using Clang for this) to compile with such a definition.

I ultimately found a solution, though I can't necessarily recommend it for everyone. It likely will induce bugs elsewhere, but it was good enough for what I needed. It also involves building Clang from scratch (thanks for the suggestion #Art!). Furthermore, the project I'm working on is using LLVM/Clang 3.7.1, so I make no claims for other versions.
As near as I can tell, the definition of a long double for the AArch64 target occurs in clang/lib/Basic/Targets.cpp:
...
MaxAtomicInlineWidth = 128;
MaxAtomicPromoteWidth = 128;
LongDoubleWidth = LongDoubleAlign = SuitableAlign = 128;
LongDoubleFormat = &llvm::APFloat::IEEEquad;
// {} in inline assembly are neon specifiers, not assembly variant
// specifiers.
...
By modifying the inner 2 lines, I removed any references to the soft-FP routines I mentioned in my question:
LongDoubleWidth = LongDoubleAlign = SuitableAlign = 64;
LongDoubleFormat = &llvm::APFloat::IEEEdouble;
My test programs -- SNU's version of the NASA Parallel Benchmarks -- still validate correctly, so I'm assuming I didn't break anything too badly. Still, this is a non-trivial modification -- I don't recommend it for most individuals (it could lead to breakage elsewhere).

I've had to do something similar before, fluffing around with types (particularly longs). Your best bet is to just manually replace the types by hand, as that's the simplest and most straightforward way to get what you want. You can try playing tricks with macros or massaging the compiler, but from my experience, you just create more problems than you solve, and it usually is a fragile solution that breaks later on.
Luckily, the sources you are working with look well maintained, and the changes you are looking for are rather blanket. You can knock this out quite simply. Assuming you are running a Unix-like system, you can run the following command from the base directory of musl:
$ grep -Rl 'long double' * | xargs -tn1 sed -i '' -e 's/long double/double/g'
This command:
Looks for the string long double in all files, recursively, and returns the filenames which contain that string.
Which gets passed into xargs, who invokes the sed command per each filename, printing as it goes along.
When sed runs, it modifies the file in-place, and substitutes long double with double.
When I tried out this command, it "just worked". I would peruse the diff more carefully to make sure it hit everything proper and didn't alter the behavior of the library.

Related

OpenCL 1.2 compiling kernel binary using LLVM

Say I have the OpenCL kernel,
/* Header to make Clang compatible with OpenCL */
/* Test kernel */
__kernel void test(long K, const global float *A, global float *b)
{
for (long i=0; i<K; i++)
for (long j=0; j<K; j++)
b[i] = 1.5f * A[K * i + j];
}
I'm trying to figure out how to compile this to a binary which can be loaded into OpenCL using the clCreateProgramWithBinary command.
I'm on a Mac (Intel GPU), and thus I'm limited to OpenCL 1.2. I've tried a number of different variations on the command,
clang -cc1 -triple spir test.cl -O3 -emit-llvm-bc -o test.bc -cl-std=cl1.2
but the binary always fails when I try to build the program. I'm at my wits' end with this, it's all so confusing and poorly documented.
The performance of the above test function can, in regular C, be significantly improved by applying the standard LLVM compiler optimization flag -O3. My understanding is that this optimization flag some how takes advantage of the contiguous memory access pattern of the inner loop to improve performance. I'd be more than happy to listen to anyone who wants to fill in the details on this.
I'm also wondering how I can first convert to SPIR code, and then convert that to a buildable binary. Eventually I would like to find a way to apply the -O3 compiler optimizations to my kernel, even if I have to manually modify the SPIR (as diffiult as that will be).
I've also gotten the SPIRV-LLVM-Translator tool working (as far as I can tell), and ran,
./llvm-spirv test.bc -o test.spv
and this binary fails to load at the clCreateProgramWithBinary step, I can't even get to the build step.
Possibly SPIRV doesn't work with OpenCL 1.2, and I have to use clCreateProgramWithIL, which unfortunately doesn't exist for OpenCL 1.2. It's difficult to say for sure why it doesn't work.
Please see my previous question here for some more context on this problem.
I don't believe there's any standardised bitcode file format that's available across implementations, at least at the OpenCL 1.x level.
As you're talking specifically about macOS, have you investigated Apple's openclc compiler? This is also what Xcode invokes when you compile a .cl file as part of a target. The compiler is located in /System/Library/Frameworks/OpenCL.framework/Libraries/openclc; it does have comprehensive --help output but that's not a great source for examples on how to use it.
Instead, I recommend you try the OpenCL-in-Xcode tutorial, and inspect the build commands it ends up running:
https://developer.apple.com/library/archive/documentation/Performance/Conceptual/OpenCL_MacProgGuide/XCodeHelloWorld/XCodeHelloWorld.html
You'll find it produces bitcode files (.bc) for 4 "architectures": i386, x86_64, "gpu_64", and "gpu_32". It also auto-generates some C code which loads this code by calling gclBuildProgramBinaryAPPLE().
I don't know if you can untangle it further than that but you certainly can ship bitcode which is GPU-independent using this compiler.
I should point out that OpenCL is deprecated on macOS, so if that's the only platform you're targeting, you really should go for Metal Compute instead. It has much better tooling and will be actively supported for longer. For cross-platform projects it might still make sense to use OpenCL even on macOS, although for shipping kernel binaries instead of source, it's likely you'll have to use platform-specific code for loading those anyway.

Is it possible to generate ansi C functions with type information for a moving GC implementation?

I am wondering what methods there are to add typing information to generated C methods. I'm transpiling a higher-level programming language to C and I'd like to add a moving garbage collector. However to do that I need the method variables to have typing information, otherwise I could modify a primitive value that looks like a pointer.
An obvious approach would be to encapsulate all (primitive and non-primitive) variables in a struct that has an extra (enum) variable for typing information, however this would cause memory and performance overhead, the transpiled code is namely meant for embedded platforms. If I were to accept the memory overhead the obvious option would be to use a heap handle for all objects and then I'd be able to freely move heap blocks. However I'm wondering if there's a more efficient better approach.
I've come up with a potential solution, namely to predeclare and group variables based whether they're primitives or not (I can do that in the transpiler), and add an offset variable to each method at the end (I need to be able to find it accurately when scanning the stack area), that tells me where the non-primitive variables begin and where they end, so I can only scan those. This means that each method will use an additional 16/32-bit (depending on arch) of memory, however this should still be more memory efficient than the heap handle approach.
Example:
void my_func() {
int i = 5;
int z = 3;
bool b = false;
void* person;
void* person_info = ...;
.... // logic
volatile int offset = 0x034;
}
My aim is for something that works universally across GCC compilers, thus my concerns are:
Can the compiler reorder the variables from how they're declared in
the source code?
Can I force the compiler to put some data in the
method's stack frame (using volatile)?
Can I find the offset accurately when scanning the stack?
I'd like to avoid assembly so this approach can work (by default) across multiple platforms, however I'm open for methods even if they involve assembly (if they're reliable).
Typing information could be somehow encoded in the C function name; this is done by C++ and other implementations and called name mangling.
Actually, you could decide, since all your C code is generated, to adopt a different convention: generate long C identifiers which are practically unique and sort-of random program-wide, such as tiziw_7oa7eIzzcxv03TmmZ and keep their typing information elsewhere (e.g. some database). On Linux, such an approach is friendly to both libbacktrace and dlsym(3) + dladdr(3) (and of course nm(1) or readelf(1) or gdb(1)), so used in both bismon and RefPerSys projects.
Typing information is practically tied to calling conventions and ABIs. For example, the x86-64 ABI for Linux mandates different processor registers for passing floating points or pointers.
Read the Garbage Collection handbook or at least P.Wilson Uniprocessor Garbage Collection Techniques survey. You could decide to use tagged integers instead of boxing them, and you could decide to have a conservative GC (e.g. Boehm's GC) instead of a precise one. In my old GCC MELT project I generated C or C++ code for a generational copying GC. Similar techniques are used both in Bismon and in RefPerSys.
Since you are transpiling to C, consider also alternatives, such as libgccjit or LLVM. Look into libjit and asmjit.
Study also the implementation of other transpilers (compilers to C), including Chicken/Scheme and Bigloo.
Can the GCC compiler reorder the variables from how they're declared in the source code?
Of course yes, depending upon the optimizations you are asking. Some variables won't even exist in the binary (e.g. those staying in registers).
Can I force the compiler to put some data in the method's stack frame (using volatile)?
Better generate a single struct variable containing all your language variables, and leave optimizations to the compiler. You will be surprised (see this draft report).
Can I find the offset accurately when scanning the stack?
This is the most difficult, and depends a lot of compiler optimizations (e.g. if you run gcc with -O1 or -O3 on the generated C code; in some cases a recent GCC -e.g GCC 9 or GCC 10 on x86-64 for Linux- is capable of tail-call optimizations; check by compiling using gcc -O3 -S -fverbose-asm then looking into the produced assembler code). If you accept some small target processor and compiler specific tricks, this is doable. Study the implementation of the Ocaml compiler.
Send me (to basile#starynkevitch.net) an email for discussion. Please mention the URL of your question in it.
If you want to have an efficient generational copying GC with multi-threading, things become extremely tricky. The question is then how many years of development can you afford spending.
If you have exceptions in your language, take also a great care. You could with great caution generate calls to longjmp.
See of course this answer of mine.
With transpiling techniques, the evil is in the details
On Linux (specifically!) see also my manydl.c program. It demonstrates that on a Linux x86-64 laptop you could generate, in practice, hundred of thousands of dlopen(3)-ed plugins. Read then How to write shared libraries
Study also the implementation of SBCL and of GNU Prolog, at least for inspiration.
PS. The dream of a totally architecture-neutral and operating-system independent transpiler is an illusion.

Calculate an arbitrary type's size without executing a program

Given any type in a C program, I would like to know it's size, such as one would when executing the following line,
printf("%d\n",sizeof(myType));
without actually executing the program. The solution must work with arbitrary types whose size can be determined at compile time, such as user-defined structs.
The rationale is that, if it can be known at compile time, there should be a way to extract that information without having to run the program. Possibly something a bit more elegant than having to parse the resulting assembler source or binary for literal contstants, but if that's the only way, I'll take what I can get.
This question doesn't quite work for me, since OP's solution relies on executing the code, and the most voted answer relies on the preprocessor info directive actually expanding macros (my toolchain doesn't apparently).
For background, I'm developing for PIC18 MCUs and using the XC8 compiler.
What I ultimately want is to verify that some structures I've defined take up their expected size in memory.
This is a classic use case for static assertions. If your compiler supports _Static_assert, you can write
_Static_assert(sizeof(mystruct) == expected_size, "Invalid struct size.");
demo 1
If you use an older compiler that does not support C-1x yet, use a common work-around that relies on declaring an array type with negative size:
#define CHECK_SIZE(x,e) if(sizeof(char[2*(sizeof(x)==e)-1])==1);
demo 2

Compiling and linking to binary simple real mode OS in C with OpenWatcom

I have a simple (extreme simple) real mode kernel written in assembly that I am trying to port to C as much as I can. In order to do that I am using OpenWatcom compiler for 16bit binary code.
In my file "os.c", if I write this line down it works:
char msg[50];
but if I do this:
char msg[50] = "hello";
it just does not work. Everytime I write a string it just get broken. I searched everywhere, tried a lot of nonsense stuff, and nothing.
Does anyone have a clue what it might be?
I can not post more than 2 links, so all 4 links needed is in pastebin. Thanks in advance.
Example: http://pastebin.com/xz96N91A
I have installed openwatcom so I could see what it gives. Apparently it's all a big mess, the data section seems to be applied in the middle of the code and such.
I recommend you instead create a COM format file. That at least appears to work.
I used the command wcl -s -zls -0 -q -ms -bcl=com os.c to create os.com which I used in place of your os.bin. I have also updated the far jump in the loader to jmp far 0x0040:0x0100 to account for the com entry point. Might need to set up some segment registers too, but for this test it worked without that.
I believe COM files had a limit of 64k, no idea if watcom enforces that. If it does, and you need more than 64k, you will have to investigate more.
Unless you are really forced to use real mode, you should probably not waste your time on it.

__udivdi3 undefined — how to find the code that uses it?

Compiling a kernel module on 32-Bit Linux kernel results in
"__udivdi3" [mymodule.ko] undefined!
"__umoddi3" [mymodule.ko] undefined!
Everything is fine on 64-bit systems. As far as I know, the reason for this is that 64-bit integer division and modulo are not supported inside a 32-bit Linux kernel.
How can I find the code issuing the 64-bit operations. They are hard to find manually because I cannot easily check if an "/" is 32-bit wide or 64-bit wide. If "normal" functions are undefined, I can grep them, but this is not possible here. Is there another good way to search the references? Some kind of "machine code grep"?
The module consists of some thousand lines of code. I can really not check every line manually.
First, you can do 64 bit division by using the do_div macro. (note the prototype is uint32_t do_div(uint64_t dividend, uint32_t divisor) and that "dividend" may be evaluated multiple times.
{
unsigned long long int x = 6;
unsigned long int y = 4;
unsigned long int rem;
rem = do_div(x, y);
/* x now contains the result of x/y */
}
Additionally, you should be able to either find usage of long long int (or uint64_t) types in your code, or alternately, you can build your module with the -g flag and use objdump -S to get a source annotated disassembly.
note: this applies to 2.6 kernels, I have not checked usage for anything lower
Actually, 64-bit integer divison and modulo are supported within a 32-bit Linux kernel; however, you must use the correct macros to do so (which ones depend on your kernel version, since recently new better ones were created IIRC). The macros will do the correct thing in the most efficient way for whichever architecture you are compiling for.
The easiest way to find where they are being used is (as mentioned in #shodanex's answer) to generate the assembly code; IIRC, the way to do so is something like make directory/module.s (together with whatever parameters you already have to pass to make). The next easiest way is to disassemble the .o file (with something like objdump --disassemble). Both ways will give you the functions where the calls are being generated (and, if you know how to read assembly, a general idea of where within the function the division is taking place).
After compilation stage, you should be able to get some documented assembly, and see were those function are called. Try to mess with CFLAGS and add the -S flags.
Compilation should stop at the assembly stage. You can then grep for the offending function call in the assembly file.

Resources