xmlCleanupParser() memory loss? - c

as xmlCleanupParser() from the very good libxml2 is not thread-safe, my question is (and I have no possibility to check it out), how much Memory (rough number) is lost to xmlParseFile() and -more importantly- is this memory loss cumulating over many calls to xPF()?

Despite the fact, that malloc() and free() or whatever memory handling implementations are not necessarily thread safe in C < 11, there's always the problem of shared/global memory. File handles to the same file in different threads aren't that bad as long as they're read only.
However, starting with libxml2 2.4.7, you might be able to enable thread safety at the API level, for single threads per document: http://www.xmlsoft.org/threads.html
When I look at the sources of libxml2 2.9.1, I'm positive that thread safety is fully implemented, despite global mutexes, there's also an atomic allocation function.

following the advice given by meaning-matters, and using the only tool, I found under OS2 (this ancient old IBM operating system) to check memory, there seams to be no difference in memory-loss between using xCP() or choosing not to (for me).


How to make sure C multithreading program read the latest value from main memory?

Let's say thread 1(executed in Core 0) updates a global variable, and the updated global value is cached in Core 0's L1 cache(not flushed to the main memory). Then thread 2 starts to execute in Core 3, and it tries to read the global variable, and read it from the main memory(since it doesn't have the cached value), so thread 2 is reading a outdated value.
I know in C you can use volatile to force the compilier do not read the value form CPU registers, which means that volatile varaible will get its value from cache or main memory. In my above scenario, even if I declare the global variable with volatile, the latest value will still be cached in L1 cache, the main memory still has an old value which will be read by thread 2. So how can we fix this issue?
or maybe my understanding is wrong, using volatile will make the variable updated in main memory directly so everytime you try to read/write a volatile variable, you read/write it from/to the main memory directly?
To some extent, people noting that the premise of your question is flawed is a reasonable answer. In general, this happens rarely if at all, and is usually indistinguishable from a race condition.
However yes it can happen. See for example memory barriers which are a great example of how such a condition (albeit due to OOO execution etc.) can occur.
That being said, what you're looking for to make sure the specific occurrence you've noted cannot happen is called a "cache flush". This can be genuinely important on ARM/ARM64 processors where separate data and instruction caches exist, but it's also a good habit to get into for data that is passed between threads this way. You can also check out the __builtin___clear_cache c compiler builtin which performs a similar task. Hopefully one of these will help you get to the bottom of your problem.
However, most likely you're not running into a caching issue, and a race condition is far more likely to be arising. If memory barriers/cache flushes don't fix your issue, audit your code very carefully for raciness.
How to make sure C multithreading program read the latest value from main memory?
You probably want to use some thread library like POSIX threads. Read some Pthread tutorial, see pthreads(7) and use pthread_create(3), pthread_mutex_init, pthread_mutex_lock, pthread condition variables, etc etc
Read also the documentation of GNU libc and of your C compiler (e.g. GCC, to be used as gcc -Wall -Wextra -g) and of your debugger (e.g. GDB).
Be prepared to fight against Heisenbugs.
In general you cannot prove statically that your C program don't have race conditions. See Rice's theorem. You could use tools like Frama-C or the Clang static analyzer, or write your own GCC plugin, or improve or extend Bismon described in this draft report.
You could be interested by CompCert.
you cannot be sure that your program read the "latest" value from memory.
(unless you add some assembly code)
Read about cache coherence protocols.

memcpy [or not?] and multithreading [std::thread from c++11]

I writing a software in C/C++ using a lot BIAS/Profil, an interval algebra library. In my algorithm I have a master which divide a domain and feeds parts of it to slave process(es). Those return an int statute about those domain parts. There is common data for reading and that's it.
I need to parallelize my code, however as soon as 2 slave-threads are running (or more I guess) and are both calling functions of this library, it segfaults. What is peculiar about those segfaults, is that gdb rarely indicates the same error line from two builds: it depends on the speed of the threads, if one started earlier, etc. I've tried having the threads yield until a go-ahead from the master, it 'stabilize' the error. I'm fairly sure that it comes from the calls to memcpy of the library (following the gdb backtrace, I always end-up on a BIAS/Profil function calling a memcpy. To be fair, almost all functions call a memcpy to a temporary object before returning the result...). From what I read on the web, it would appear that memcpy() could be not thread-safe, depending on the implementations (especially here). (It seems weird for a function supposed to only read the shared data... or maybe when writing the thread-wise data both threads go for the same memory space?)
To try to address this, I'd like to 'replace' (at least for tests if behavior changes) the call to memcpy for a mutex-framed call. (something like mtx.lock();mempcy(...);mtx.unlock();)
1st question: I'm not a dev/code engineer at all, and lack of lot of base knowledge. I think that as I use a pre-built BIAS/Profil library, the memcpy called is the one of the system the library was built on, correct? If so, would it change anything were I to try building the library from source on my system? (I'm not sure I can build this library hence the question.)
2nd question:
in my string.h, memcpy is declared by:
extern void * memcpy(void *,const void *,__kernel_size_t); #endif and in some other string headers (string_64.h, string_32.h) a definition of the form: #define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len)) or some more explicit definition, or just a declaration like the one quoted.
It's starting to get ugly but, ideally, I'd like to create a pre-processor variable #define __HAVE_ARCH_MEMCPY 1, and a void * memcpy(void *,const void *,__kernel_size_t) which would do the mutex-framed memcpy with the the dismissed memcpy.
The idea here is to avoid messing with the library and make it work with 3 lines of code ;)
Any better idea? (it would make my day...)
IMHO you shouldn't concentrate to the memcpy()s, but to the higher level funktionality.
And memcpy() is thread-safe if the handled memory intervals of the parallel running threads don't overlap. Practically, in the memcpy() is there only a for(;;) loop (with a lot of optimizations) [at least in glibc], it is the cause, why is it in declared.
If you want to know, what your parallel memcpy()-ing threads will do, you should imagine the for(;;) loops which copy memory through longint-pointers.
Given that your observations, and that the Profil lib is from the last millennium, and that the documentation (homepage and Profil2.ps) do not even contain the word "thread", I would assume that the lib is not thread safe.
1st: No, usually memcpy is part of libc which is dynamically linked (at least nowadays). On linux, check with ldd NAMEOFBINARY, which should give a line with something like libc.so.6 => /lib/i386-linux-gnu/libc.so.6 or similar. If not: rebuild. If yes: rebuilding could help anyway, as there are many other factors.
Besides this, I think memcpy is thread safe as long as long as you do never write back data (even writing back unmodified data will hurt: https://blogs.oracle.com/dave/entry/memcpy_concurrency_curiosities).
2nd: If it turns out that you have to use a modified memcpy, also think about LD_PRELOAD.
In general, you must use a critical section, mutex, or some other protection technique to keep multiple threads from accessing non thread safe (non- re-entrant) functions simultaneously. Some ANSI C implementations of memcpy() are not thread safe, some are. ( safe, not safe )
Writing functions that are thread-safe, and/or writing threaded programs that can safely accommodate non thread safe functions is a substantial topic. Very doable, but requires reading up on the topic. There is much written. This, will at least help you to start asking the right questions.

Which Unix don't have a thread-safe malloc?

I want my C program to be portable even on very old Unix OS but the problem is that I'm using pthreads and dynamic allocation (malloc). All Unix I know of have a thread-safe malloc (Linux, *BSD, Irix, Solaris) however this is not guaranteed by the C standard, and I'm sure there are very old versions where this is not true.
So, is there some list of platforms that I'd need to wrap malloc() calls with a mutex lock? I plan to write a ./configure test that checks if current platform is in that list.
The other alternative would be to test malloc() for thread-safety, but I know of no deterministic way to do this. Any ideas on this one too?
The only C standard that has threads (and can thus is relevant to your question) is C11, which states:
For purposes of determining the existence of a data race, memory
allocation functions behave as though they accessed only memory
locations accessible through their arguments and not other static
duration storage.
Or in other words, as long as two threads don't pass the same address to realloc or free all calls to the memory functions are thread safe.
For POSIX, that is all Unix'es that you can find nowadays you have:
Each function defined in the System Interfaces volume of IEEE Std 1003.1-2001 is thread-safe unless explicitly stated otherwise.
I don't know from where you take your assertion that malloc wouldn't be thread safe for older Unixes, a system with threads that doesn't implement that thread safe is pretty much useless. What might be a problem on such an older system is performance, but it should always be functional.

Memory pools implementation in C

I am looking for a good memory pool implementation in C.
it should include the following:
Anti fragmentation.
Be super fast :)
Ability to "bundle" several allocations from different sizes under some identifier and delete all the allocations with the given identifier.
Thread safe
I think the excellent talloc, developed as part of samba might be what you're looking for. The part I find most interesting is that any pointer returned from talloc is a valid memory context. Their example is:
struct foo *X = talloc(mem_ctx, struct foo);
X->name = talloc_strdup(X, "foo");
// ...
talloc_free(X); // frees memory for both X and X->name
In response to your particular points:
(1) Not sure what anti-fragmentation is in this case. In C you're not going to get compacting garbage collection anyway, so I think your choices are somewhat limited.
(2) It advertises being only 4% slower than plain malloc(3), which is quite fast.
(3) See example above.
(4) It is thread safe as long as different threads use different contexts & the underlying malloc is thread safe.
Have you looked into
nedmalloc http://www.nedprod.com/programs/portable/nedmalloc/
ptmalloc http://www.malloc.de/en/
Both leverage a memory pool but keep it mostly transparent to the user.
In general, you will find best performance in your own custom memory pool (you can optimize for your pattern). I ended up writing a few for different access patterns.
For memory pools that have been thoroughly tried and tested you may want to just use the APR ones:
Mind you, single pools are not thread safe, you'll have to handle that yourself.
bget is another choice. It's well tested and production ready.

Patterns for freeing memory in C?

I'm currently working on a C based application am a bit stuck on freeing memory in a non-antipattern fashion. I am a memory-management amateur.
My main problem is I declare memory structures in various different scopes, and these structures get passed around by reference to other functions. Some of those functions may throw errors and exit().
How do I go about freeing my structures if I exit() in one scope, but not all my data structures are in that scope?
I get the feeling I need to wrap it all up in a psuedo exception handler and have the handler deal with freeing, but that still seems ugly because it would have to know about everything I may or may not need to free...
Consider wrappers to malloc and using them in a disciplined way. Track the memory that you do allocate (in a linked list maybe) and use a wrapper to exit to enumerate your memory to free it. You could also name the memory with an additional parameter and member of your linked list structure. In applications where allocated memory is highly scope dependent you will find yourself leaking memory and this can be a good method to dump the memory and analyze it.
Threading in your application will make this very complex. See other answers regarding threading issues.
You don't need to worry about freeing memory when exit() is called. When the process exits, the operating system will free all of the associated memory.
I think to answer this question appropriately, we would need to know about the architecture of your entire program (or system, or whatever the case may be).
The answer is: it depends. There are a number of strategies you can use.
As others have pointed out, on a modern desktop or server operating system, you can exit() and not worry about the memory your program has allocated.
This strategy changes, for example, if you are developing on an embedded operating system where exit() might not clean everything up. Typically what I see is when individual functions return due to an error, they make sure to clean up anything they themselves have allocated. You wouldn't see any exit() calls after calling, say, 10 functions. Each function would in turn indicate an error when it returns, and each function would clean up after itself. The original main() function (if you will - it might not be called main()) would detect the error, clean up any memory it had allocated, and take the appropriate actions.
When you just have scopes-within-scopes, it's not rocket science. Where it gets difficult is if you have multiple threads of execution, and shared data structures. Then you might need a garbage collector or a way to count references and free the memory when the last user of the structure is done with it. For example, if you look at the source to the BSD networking stack, you'll see that it uses a refcnt (reference count) value in some structures that need to be kept "alive" for an extended period of time and shared among different users. (This is basically what garbage collectors do, as well.)
You can create a simple memory manager for malloc'd memory that is shared between scopes/functions.
Register it when you malloc it, de-register it when you free it. Have a function that frees all registered memory before you call exit.
It adds a bit of overhead, but it helps keep track of memory. It can also help you hunt down pesky memory leaks.
Michael's advice is sound - if you are exiting, you don't need to worry about freeing the memory since the system will reclaim it anyway.
One exception to that is shared memory segments - at least under System V Shared Memory. Those segments can persist longer than the program that creates them.
One option not mentioned so far is to use an arena-based memory allocation scheme, built on top of standard malloc(). If the entire application uses a single arena, your cleanup code can release that arena, and all is freed at once. (APR - Apache Portable Runtime - provides a pools feature which I believe is similar; David Hanson's "C Interfaces and Implementations" provides an arena-based memory allocation system; I've written one that you could use if you wanted to.) You can think of this as "poor man's garbage collection".
As a general memory discipline, every time you allocate memory dynamically, you should understand which code is going to release it and when it can be released. There are a few standard patterns. The simplest is "allocated in this function; released before this function returns". This keeps the memory largely under control (if you don't run too many iterations on the loop that contains the memory allocation), and scopes it so that it can be made available to the current function and the functions it calls. Obviously, you have to be reasonably sure that the functions you call are not going to squirrel away (cache) pointers to the data and try to reuse them later after you've released and reused the memory.
The next standard pattern is exemplified by fopen() and fclose(); there's a function that allocates a pointer to some memory, which can be used by the calling code, and then released when the program has finished with it. However, this often becomes very similar to the first case - it is usually a good idea to call fclose() in the function that called fopen() too.
Most of the remaining 'patterns' are somewhat ad hoc.
People have already pointed out that you probably don't need to worry about freeing memory if you're just exiting (or aborting) your code in case of error. But just in case, here's a pattern I developed and use a lot for creating and tearing down resources in case of error. NOTE: I'm showing a pattern here to make a point, not writing real code!
int foo_create(foo_t *foo_out) {
int res;
foo_t foo;
bar_t bar;
baz_t baz;
res = bar_create(&bar);
if (res != 0)
goto fail_bar;
res = baz_create(&baz);
if (res != 0)
goto fail_baz;
foo = malloc(sizeof(foo_s));
if (foo == NULL)
goto fail_alloc;
foo->bar = bar;
foo->baz = baz;
etc. etc. you get the idea
*foo_out = foo;
return 0; /* meaning OK */
/* tear down stuff */
return res; /* propagate error code */
I can bet I'm going to get some comments saying "this is bad because you use goto". But this is a disciplined and structured use of goto that makes code clearer, simpler, and easier to maintain if applied consistently. You can't achieve a simple, documented tear-down path through the code without it.
If you want to see this in real in-use commercial code, take a look at, say, arena.c from the MPS (which is coincidentally a memory management system).
It's a kind of poor-man's try...finish handler, and gives you something a bit like destructors.
I'm going to sound like a greybeard now, but in my many years of working on other people's C code, lack of clear error paths is often a very serious problem, especially in network code and other unreliable situations. Introducing them has occasionally made me quite a bit of consultancy income.
There are plenty of other things to say about your question -- I'm just going to leave it with this pattern in case that's useful.
Very simply, why not have a reference counted implementation, so when you create an object and pass it around you increment and decrement the reference counted number (remember to be atomic if you have more than one thread).
That way, when an object is no longer used (zero references) you can safely delete it, or automatically delete it in the reference count decrement call.
This sounds like a task for a Boehm garbage collector.
Depends on the system of course whether you can or should afford to use it.
