Basic heap usage statistics in GCC on 64-bit platform - c

I need to answer a basic question from inside my C program compiled by GCC for Linux: how much of process heap is currently in use (allocated by malloc) and how much resides if free heap blocks. GNU implementation of standard library has mallinfo function which reports exactly what I need, but it only usable with 32-bit configurations and, AFAIK, there's no 64-bit equivalent of that functionality (BTW, anyone knows why?).
I use GCC on Linux, so I need this for Linux. But I assume that the heap is opaque to the system, so the only way to answer this question is to use the means provided by the implementation of the standard library.
In MSVC implementation on Windows platform there's no equivalent of mallinfo function but there's so called heap-walk functionality, which allows one to calculate the necessary information by iterating through all blocks in the heap. AFAIK, there's no heap-walk interface in GNU C library. (Is there?).
So, again, what do I do in GCC? It doesn't have to be efficient, meaning that the aforementioned heap-walk based approach would work perfectly fine for me. How do I find out how much heap is in use and how much is free in GCC? I can probably try installing malloc-hooks and "manually" track the sizes, although I'm not sure how to determine the current heap arena size (see mallinfo.arena) without using mallinfo.

This thread from 2004 involving key glibc developers indicates that since the interface already "...does not fit the implementation at all.", there was seen as little point in making a 64-bit clean version of it. (The mallinfo() interface wasn't designed for glibc - it was being considered for inclusion in the SUS).
Depending on what you're trying to do with the information, you might be able to use malloc_stats(), which just produces output on standard error - since it's just textual output it's able to change to match the internal implementation of malloc(), and will therefore at least have the advantage of producing sensible results.

Related

Force clang to inline or use static call to `memmove()`

The following question sure sounds like an XY problem, but trust me, it's not. It is related to the lengthy investigation performed in this question.
I have very good reason to believe that a solution to that question's issue (not for the minimal example there, but for my actual code) involves ensuring memmove() is not called through a shared library, but rather as a static call, straight from my code to memmove() without going through a PLT, or even better, if memmove()'s code is inlined inside my code.
However, I have failed to find a command-line switch or anything else to prevent memmove() being called dynamically (note the dynamic call is confirmed by disassembling the output, after a build using -O3.) Does such a switch exist? While I could grab some optimized memmove() code for my platform (such as this), I would rather avoid introducing some code which might be obsoleted for a future architecture, and I do not consider this good programming practice anyway.
I'm using clang with the version string Ubuntu clang version 12.0.0-3ubuntu1~21.04.1 on a Raspberry Pi 4. Output of uname -a is Linux rpi4 5.11.0-1017-raspi #18-Ubuntu SMP PREEMPT Mon Aug 23 07:34:31 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux.
While I could grab some optimized memmove() code for my platform (such as this), I would rather avoid introducing some code which might be obsoleted for a future architecture
Your best bet is to write a straight-forward inline C or C++ implementation of memmove, and depend on the compiler to inline and optimize it.
Surprisingly doing this often beats hand-coded assembly implementations, because the compiler can often tell that the regions are non-overlapping, or that a whole multiple of 64-bit words is being used, and thus various parts of the generic memmove can be optimized out.
And doing this in C or C++ guarantees that it will continue to work on future architectures (so long as you can compile for them).

Is it possible to generate ansi C functions with type information for a moving GC implementation?

I am wondering what methods there are to add typing information to generated C methods. I'm transpiling a higher-level programming language to C and I'd like to add a moving garbage collector. However to do that I need the method variables to have typing information, otherwise I could modify a primitive value that looks like a pointer.
An obvious approach would be to encapsulate all (primitive and non-primitive) variables in a struct that has an extra (enum) variable for typing information, however this would cause memory and performance overhead, the transpiled code is namely meant for embedded platforms. If I were to accept the memory overhead the obvious option would be to use a heap handle for all objects and then I'd be able to freely move heap blocks. However I'm wondering if there's a more efficient better approach.
I've come up with a potential solution, namely to predeclare and group variables based whether they're primitives or not (I can do that in the transpiler), and add an offset variable to each method at the end (I need to be able to find it accurately when scanning the stack area), that tells me where the non-primitive variables begin and where they end, so I can only scan those. This means that each method will use an additional 16/32-bit (depending on arch) of memory, however this should still be more memory efficient than the heap handle approach.
Example:
void my_func() {
int i = 5;
int z = 3;
bool b = false;
void* person;
void* person_info = ...;
.... // logic
volatile int offset = 0x034;
}
My aim is for something that works universally across GCC compilers, thus my concerns are:
Can the compiler reorder the variables from how they're declared in
the source code?
Can I force the compiler to put some data in the
method's stack frame (using volatile)?
Can I find the offset accurately when scanning the stack?
I'd like to avoid assembly so this approach can work (by default) across multiple platforms, however I'm open for methods even if they involve assembly (if they're reliable).
Typing information could be somehow encoded in the C function name; this is done by C++ and other implementations and called name mangling.
Actually, you could decide, since all your C code is generated, to adopt a different convention: generate long C identifiers which are practically unique and sort-of random program-wide, such as tiziw_7oa7eIzzcxv03TmmZ and keep their typing information elsewhere (e.g. some database). On Linux, such an approach is friendly to both libbacktrace and dlsym(3) + dladdr(3) (and of course nm(1) or readelf(1) or gdb(1)), so used in both bismon and RefPerSys projects.
Typing information is practically tied to calling conventions and ABIs. For example, the x86-64 ABI for Linux mandates different processor registers for passing floating points or pointers.
Read the Garbage Collection handbook or at least P.Wilson Uniprocessor Garbage Collection Techniques survey. You could decide to use tagged integers instead of boxing them, and you could decide to have a conservative GC (e.g. Boehm's GC) instead of a precise one. In my old GCC MELT project I generated C or C++ code for a generational copying GC. Similar techniques are used both in Bismon and in RefPerSys.
Since you are transpiling to C, consider also alternatives, such as libgccjit or LLVM. Look into libjit and asmjit.
Study also the implementation of other transpilers (compilers to C), including Chicken/Scheme and Bigloo.
Can the GCC compiler reorder the variables from how they're declared in the source code?
Of course yes, depending upon the optimizations you are asking. Some variables won't even exist in the binary (e.g. those staying in registers).
Can I force the compiler to put some data in the method's stack frame (using volatile)?
Better generate a single struct variable containing all your language variables, and leave optimizations to the compiler. You will be surprised (see this draft report).
Can I find the offset accurately when scanning the stack?
This is the most difficult, and depends a lot of compiler optimizations (e.g. if you run gcc with -O1 or -O3 on the generated C code; in some cases a recent GCC -e.g GCC 9 or GCC 10 on x86-64 for Linux- is capable of tail-call optimizations; check by compiling using gcc -O3 -S -fverbose-asm then looking into the produced assembler code). If you accept some small target processor and compiler specific tricks, this is doable. Study the implementation of the Ocaml compiler.
Send me (to basile#starynkevitch.net) an email for discussion. Please mention the URL of your question in it.
If you want to have an efficient generational copying GC with multi-threading, things become extremely tricky. The question is then how many years of development can you afford spending.
If you have exceptions in your language, take also a great care. You could with great caution generate calls to longjmp.
See of course this answer of mine.
With transpiling techniques, the evil is in the details
On Linux (specifically!) see also my manydl.c program. It demonstrates that on a Linux x86-64 laptop you could generate, in practice, hundred of thousands of dlopen(3)-ed plugins. Read then How to write shared libraries
Study also the implementation of SBCL and of GNU Prolog, at least for inspiration.
PS. The dream of a totally architecture-neutral and operating-system independent transpiler is an illusion.

What remains in C if I exclude libraries and compiler extensions?

Imagine a situation where you can't or don't want to use any of the libraries provided by the compiler as "standard", nor any external library. You can't use even the compiler extensions (such as gcc extensions).
What is the remaining part you get if you strip C language of all the things a lot of people use as a matter of course?
In such a way, probably a list of every callable function supported by any big C compiler (not only ANSI C) out-of-box would be satisfying as as answer as it'd at least approximately show the use-case of the language.
First I thought about sizeof() and printf() (those were already clarified in the comments - operator + stdio), so... what remains? In-line assembly seem like an extension too, so that pretty much strips even the option to use assembly with C if I'm right.
Probably in the matter of code it'd be easier to understand. Imagine a code compiled with only e.g. gcc main.c (output flag permitted) that has no #include, nor extern.
int main() {
// replace_me
return 0;
}
What can I call to actually do something else than "boring" type math and casting from type to type?
Note that switch, goto, if, loops and other constructs that do nothing and only allow repeating a piece of code aren't the thing I'm looking for (if it isn't obvious).
(Hopefully the edit clarified wtf I'm actually asking, but Matteo's answer pretty much did it.)
If you remove all libraries essentially you have something similar to a freestanding implementation of C (which still has to provide some libraries - say, string.h, but that's nothing you couldn't easily implement yourself in portable C), and that's what normally you start with when programming microcontrollers and other computers that don't have a ready-made operating system - and what operating system writers in general use when they compile their operating systems.
There you typically have two ways of doing stuff besides "raw" computation:
assembly blocks (where you can do literally anything the underlying machine can do);
memory mapped IO (you set a volatile pointer to some hardware dependent location and read/write from it; that affects hardware stuff).
That's really all you need to build anything - and after all, it all boils down to that stuff anyway, the C library of a regular hosted implementation is normally written in C itself, with some assembly used either for speed or to communicate with the operating system1 (typically the syscalls are invoked through some kind of interrupt).
Again, it's nothing you couldn't implement yourself. But the point of having a standard library is both to avoid to continuously reinvent the wheel, and to have a set of portable functions that spare you to have to rewrite everything knowing the details of each target platform.
And mainstream operating systems, in turn, are generally written in a mix or C and assembly as well.
C has no "built-in" functions as such. A compiler implementation may include "intrinsic" functions that are implemented directly by the compiler without provision of an external library, although a prototype declaration is still required for intrinsics, so you would still normally include a header file for such declarations.
C is a systems-level language with a minimal run-time and start-up requirement. Because it can directly access memory and memory mapped I/O there is very little that it cannot do (and what it cannot do is what you use assembly, in-line assembly or intrinsics for). For example, much of the library code you are wondering what you can do without is written in C. When running in an OS environment however (using C as an application-level rather then system-level language), you cannot practically use C in that manner - the OS has control over such things as I/O and memory-management and in modern systems will normally prevent unmediated access to such resources. Of course that OS itself is likely to largely written in C (and/or C++).
In a standalone of bare-metal environment with no OS, C is often used very early in the bootstrap process initialising hardware and establishing an application execution environment. In fact on ARM Cortex-M processors it is possible to boot directly into C code from reset, since the hardware loads an initial stack-pointer and start address from the vector table on start-up; this being enough to run C code that does not rely on library or static data initialisation - such initialisation can however be written in C before calling main().
Note that sizeof is not a function, it is an operator.
I don't think you really understand the situation.
You don't need a header to call a function in C. You can call with unchecked parameters - a bad idea and an obsolete feature, but still supported. And if a compiler links a library by default instead of only when you explicitly tell it to, that's only a little switch within the compiler to "link libc". Notoriously Unix compilers need to be told to link the math library, it wasn't linked by default because some very early programs didn't use floating point.
To be fair, some standard library functions like memcpy tend to be special-cased these days as they lend themselves to inlining and optimisation.
The standard library is documented and is usually available, though in effect deprecated by Microsoft for security reasons. You can write pretty much any function quite easily with only stdlib functions, what you can't do is fancy IO.

Can I run GCC as a daemon (or use it as a library)?

I would like to use GCC kind of as a JIT compiler, where I just compile short snippets of code every now and then. While I could of course fork a GCC process for each function I want to compile, I find that GCC's startup overhead is too large for that (it seems to be about 50 ms on my computer, which would make it take 50 seconds to compile 1000 functions). Therefore, I'm wondering if it's possible to run GCC as a daemon or use it as a library or something similar, so that I can just submit a function for compilation without the startup overhead.
In case you're wondering, the reason I'm not considering using an actual JIT library is because I haven't found one that supports all the features I want, which include at least good knowledge of the ABI so that it can handle struct arguments (lacking in GNU Lightning), nested functions with closure (lacking in libjit) and having a C-only interface (lacking in LLVM; I also think LLVM lacks nested functions).
And no, I don't think I can batch functions together for compilation; half the point is that I'd like to compile them only once they're actually called for the first time.
I've noticed libgccjit, but from what I can tell, it seems very experimental.
My answer is "No (you can't run GCC as a daemon process, or use it as a library)", assuming you are trying to use the standard GCC compiler code. I see at least two problems:
The C compiler deals in complete translation units, and once it has finished reading the source, compiles it and exits. You'd have to rejig the code (the compiler driver program) to stick around after reading each file. Since it runs multiple sub-processes, I'm not sure that you'll save all that much time with it, anyway.
You won't be able to call the functions you create as if they were normal statically compiled and linked functions. At the least you will have to load them (using dlopen() and its kin, or writing code to do the mapping yourself) and then call them via the function pointer.
The first objection deals with the direct question; the second addresses a question raised in the comments.
I'm late to the party, but others may find this useful.
There exists a REPL (read–eval–print loop) for c++ called Cling, which is based on the Clang compiler. A big part of what it does is JIT for c & c++. As such you may be able to use Cling to get what you want done.
The even better news is that Cling is undergoing an attempt to upstream a lot of the Cling infrastructure into Clang and LLVM.
#acorn pointed out that you'd ruled out LLVM and co. for lack of a c API, but Clang itself does have one which is the only one they guarantee stability for: https://clang.llvm.org/doxygen/group__CINDEX.html

dlmalloc crash on Win7

For some time now I've been happily using dlmalloc for a cross-platform project (Windows, Mac OS X, Ubuntu). Recently, however, it seems that using dlmalloc leads to a crash-on-exit on Windows 7.
To make sure that it wasn't something goofy in my project, I created a super-minimal test program-- it doesn't do anything but return from main. One version ("malloctest") links to dlmalloc and the other ("regulartest") doesn't. On WinXP, both run fine. On Windows 7, malloctest crashes. You can see screencasts of the tests here.
My question is: why is this happening? Is it a bug in dlmalloc? Or has the loader in Windows 7 changed? Is there a workaround?
fyi, here is the test code (test.cpp):
#include <stdio.h>
int main() {
return 0;
}
and here is the nmake makefile:
all: regulartest.exe malloctest.exe
malloctest.exe: malloc.obj test.obj
link /out:$# $**
regulartest.exe: test.obj
link /out:$# $**
clean:
del *.exe *.obj
For brevity, I won't include the dlmalloc source in this post, but you can get it (v2.8.4) here.
Edit: See these other relavent SO posts:
Is there a way to redefine malloc at link time on Windows?
Globally override malloc in visual c++
Looks like a bug in the C runtime. Using Visual Studio 2008 on Windows 7, I reproduced the same problem. After some quick debugging by putting breakpoints in dlmalloc and dlfree, I saw that dlfree was getting called with an address that it never returned earlier from dlmalloc, and then it was hitting an access violation shortly thereafter.
Thankfully, the C runtime's source code is distributed along with VS, so I could see that this call to free was coming from the __endstdio function in _file.c. The corresponding allocation was in __initstdio, and it was calling _calloc_crt to allocate its memory. _calloc_crt calls _calloc_impl, which calls HeapAlloc to get memory. _malloc_crt (used elsewhere in the C runtime, such as to allocate memory for the environment and for argv), on the other hand, calls straight to malloc, and _free_crt calls straight to free.
So, for the memory that gets allocated with _malloc_crt and freed with _free_crt, everything is fine and dandy. But for the memory that gets allocated with _calloc_crt and freed with _free_crt, bad things happen.
I don't know if replacing malloc like this is supported -- if it is, then this is a bug with the CRT. If not, I'd suggest looking into a different C runtime (e.g. MinGW or Cygwin GCC).
Using dlmalloc in cross-platform code is an oxymoron. Replacing any standard C functions (especially malloc and family) results in undefined behavior. The closest thing to a portable way to replace malloc is using search-and-replace (not #define; that's also UB) on the source files to call (for example) my_malloc instead of malloc. Note that internal C library functions will still use their standard malloc, so if the two conflict, things will still blow up. Basically, trying to replace malloc is just really misguided. If your system really has a broken malloc implementation (too slow, too much fragmentation, etc.) then you need to do your replacement in an implementation-specific way, and disable the replacement on all systems except ones where you've carefully checked that your implementation-specific replacement works correctly.

Resources