get_user_pages_fast() from kernel thread - c

I need to call get_user_pages_fast() from a kernel thread. But get_user_pages_fast() uses current->mm internally, which is set to NULL for kernel thread. Is there any way to get around this? The kernel thread in question is working on behalf of another process, say x, would it be be fine to just set x->mm to current->mm and invoke get_user_pages_fast()?
[EDIT 1]: I verified this and it seems to be working. I am still concerned if it could break in some cases. Any insight is welcome. Thanks.

Your "hack" will indeed work, but let's take a step back and understand what the idea of it is:
When you are in a kernel thread, (And I am talking about a pure kernel thread (child of kthreadd), not a user thread executing in kernel mode, as would be the case of servicing a syscall), there is no user memory to speak of. This is why current->mm is null: There is no "current" user space memory.
When you assign current->mm to x->mm you are "cheating" by annexing the process memory space of the innocent x to be your own. As a consequence, any allocation you perform will be charged to x, and will be visible by x (it is, after all, part of its memory space). Also, there might be internal kernel checks on current->mm which might be tricked, leading to your kernel mode thread to be treated by the kernel as if it were a user mode thread (though arguably other checks rely on KERNEL_DS/USER_DS, which you're not modifying). Still, a concern. This will break if x ever dies (hey - nobody's immortal), and will likely cause an oops, if not a panic altogether.
You haven't said WHY you need to get user pages - if the case is that you know x is alive and you are doing this as part of, say, IPC/shmem, I can see a reason for that. If that is the case, you might want to provide some API for the process in question to "register" with the kernel thread. Otherwise, your solution works, but is.. well, not as neat as it could be.

I'm not convinced this is totally safe. The _fast part of get_user_pages_fast means that acquiring mm->mmap_sem is not required, and part of the reason that works is because it is assumed that we are running within the process itself (so eg the current->mm can't go away completely). Since you're running in another thread, you're susceptible to races if the real process ever does something that changes its mapping.
I guess the question is why can't you just use get_user_pages instead?

Related

Does a page fault cause a thread context switch on Linux?

If a thread suffers a major fault while trying to read from an address, and the data must be swapped in from "disk", does Linux take advantage of that to run another waiting thread, if there is one?
From what I've read online, the answer is yes. But I haven't seen anything conclusive.
That depends on the scheduler you use. In general, the answer is yes, unless the disk operation is sufficiently fast or unless the kernel has another reason not to swap in a different process.

Calling convention which only allows one instance of a function at a time

Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.

Making process survive failure in its thread

I'm writing app that has many independant threads. While I'm doing quite low level, dangerous stuff there, threads may fail (SIGSEGV, SIGBUS, SIGFPE) but they should not kill whole process. Is there a way to do it proper way?
Currently I intercept aforementioned signals and in their signal handler then I call pthread_exit(NULL). It seems to work but since pthread_exit is not async-signal-safe function I'm a bit concerned about this solution.
I know that splitting this app into multiple processes would solve the problem but in this case it's not an feasible option.
EDIT: I'm aware of all the Bad Thingsā„¢ that can happen (I'm experienced in low-level system and kernel programming) due to ignoring SIGSEGV/SIGBUS/SIGFPE, so please try to answer my particular question instead of giving me lessons about reliability.
The PROPER way to do this is to let the whole process die, and start another one. You don't explain WHY this isn't appropriate, but in essence, that's the only way that is completely safe against various nasty corner cases (which may or may not apply in your situation).
I'm not aware of any method that is 100% safe that doesn't involve letting the whole process. (Note also that sometimes just the act of continuing from these sort of errors are "undefined behaviour" - it doesn't mean that you are definitely going to fall over, just that it MAY be a problem).
It's of course possible that someone knows of some clever trick that works, but I'm pretty certain that the only 100% guaranteed method is to kill the entire process.
Low-latency code design involves a careful "be aware of the system you run on" type of coding and deployment. That means, for example, that standard IPC mechanisms (say, using SysV msgsnd/msgget to pass messages between processes, or pthread_cond_wait/pthread_cond_signal on the PThreads side) as well as ordinary locking primitives (adaptive mutexes) are to be considered rather slow ... because they involve something that takes thousands of CPU cycles ... namely, context switches.
Instead, use "hot-hot" handoff mechanisms such as the disruptor pattern - both producers as well as consumers spin in tight loops permanently polling a single or at worst a small number of atomically-updated memory locations that say where the next item-to-be-processed is found and/or to mark a processed item complete. Bind all producers / consumers to separate CPU cores so that they will never context switch.
In this type of usecase, whether you use separate threads (and get the memory sharing implicitly by virtue of all threads sharing the same address space) or separate processes (and get the memory sharing explicitly by using shared memory for the data-to-be-processed as well as the queue mgmt "metadata") makes very little difference because TLBs and data caches are "always hot" (you never context switch).
If your "processors" are unstable and/or have no guaranteed completion time, you need to add a "reaper" mechanism anyway to deal with failed / timed out messages, but such garbage collection mechanisms necessarily introduce jitter (latency spikes). That's because you need a system call to determine whether a specific thread or process has exited, and system call latency is a few micros even in best case.
From my point of view, you're trying to mix oil and water here; you're required to use library code not specifically written for use in low-latency deployments / library code not under your control, combined with the requirement to do message dispatch with nanosec latencies. There is no way to make e.g. pthread_cond_signal() give you nsec latency because it must do a system call to wake the target up, and that takes longer.
If your "handler code" relies on the "rich" environment, and a huge amount of "state" is shared between these and the main program ... it sounds a bit like saying "I need to make a steam-driven airplane break the sound barrier"...

Shared semaphore between user and kernel spaces

Short version
Is it possible to share a semaphore (or any other synchronization lock) between user space and kernel space? Named POSIX semaphores have kernel persistence, that's why I was wondering if it is possible to also create, and/or access them from kernel context.
Searching the internet didn't help much due to the sea of information on normal usage of POSIX semaphores.
Long version
I am developing a unified interface to real-time systems in which I have some added book keeping to take care of, protected by a semaphore. These book keepings are done on resource allocation and deallocation, which is done in non-real-time context.
With RTAI, the thread waiting and posting a semaphore however needs to be in real-time context. This means that using RTAI's named semaphore means switching between real-time and non-real-time context on every wait/post in user space, and worse, creating a short real-time thread for every sem/wait in kernel space.
What I am looking for is a way to share a normal Linux or POSIX semaphore between kernel and user spaces so that I can safely wait/post it in non-real-time context.
Any information on this subject would be greatly appreciated. If this is not possible, do you have any other ideas how this task could be accomplished?1
1 One way would be to add a system call, have the semaphore in kernel space, and have user space processes invoke that system call and the semaphore would be all managed in kernel space. I would be happier if I didn't have to patch the kernel just because of this though.
Well, you were in the right direction, but not quite -
Linux named POSIX semaphore are based on FUTex, which stands for Fast User-space Mutex. As the name implies, while their implementation is assisted by the kernel, a big chunk of it is done by user code. Sharing such a semaphore between kernel and user space would require re-implementing this infrastructure in the kernel. Possible, but certainly not easy.
SysV Semaphores on the other hand are implemented completely in kernel and are only accessible to user space via standard system calls (e.g. sem_timedwait() and friends).
This means that every SysV related operations (semaphore creation, taking or release) is actually implemented in the kernel and you can simply call the underlying kernel function from your code to take the same semaphore from the kernel is needed.
Thus, your user code will simply call sem_timedwait(). That's the easy part.
The kernel part is just a little bit more tricky: you have to find the code that implement sem_timedwait() and related calls in the kernel (they are are all in the file ipc/sem.c) and create a replica of each of the functions that does what the original function does without the calls to copy_from_user(...) and copy_to_user(..) and friends.
The reason for this is that those kernel function expect to be called from a system call with a pointer to a user buffer, while you want to call them with parameters in kernel buffers.
Take for example sem_timedwait() - the relevant kernel function is sys_timedwait() in ipc/sem.c (see here: http://lxr.free-electrons.com/source/ipc/sem.c#L1537). If you copy this function in your kernel code and just remove the parts that do copy_from_user() and copy_to_user() and simply use the passed pointers (since you'll call them from kernel space), you'll get kernel equivalent functions that can take SysV semaphore from kernel space, along side user space - so long as you call them from process context in the kernel (if you don't know what this last sentence mean, I highly recommend reading up on Linux Device Drivers, 3rd edition).
Best of luck.
One solution I can think of is to have a /proc (or /sys or whatever) file on a main kernel module where writing 0/1 to it (or read from/write to it) would cause it to issue an up/down on a semaphore. Exporting that semaphore allows other kernel modules to directly access it while user applications would go through the /proc file system.
I'd still wait to see if the original question has an answer.
I'm not really experienced on this by any means, but here's my take. If you look at glibc's implementation of sem_open, and sem_wait, it's really just creating a file in /dev/shm, mmap'ing a struct from it, and using atomic operations on it. If you want to access the named semaphore from user space, you will probably have to patch the tmpfs subsystem. However, I think this would be difficult, as it wouldn't be straightforward to determine if a file is meant to be a named semaphore.
An easier way would probably be to just reuse the kernel's semaphore implementation and have the kernel manage the semaphore for userspace processes. To do this, you would write a kernel module which you associate with a device file. Then define two ioctl's for the device file, one for wait, and one for post. Here is a good tutorial on writing kernel modules, including setting up a device file and adding I/O operations for it. http://www.freesoftwaremagazine.com/articles/drivers_linux. I don't know exactly how to implement an ioctl operation, but I think you can just assign a function to the ioctl member of the file_operations struct. Not sure what the function signature should be, but you could probably figure it out by digging around in the kernel source.
As I'm sure you know, even the best working solution to this would likely be very ugly. If I were in your place, I would simply concede the battle and use rendezvous points to sync the processes
I have read your project's README and I have the following observations. Apologies in advance:
Firstly there already is a universal interface to real time systems. It is called POSIX; certainly VxWorks, Integrity and QNX are POSIX compliant and in my experience there are very few problems with portability if you develop within the POSIX API. Whether POSIX is sane or not is another matter, but it's the one we all use.
[The reason most RTOSes are POSIX compliant is because one of the big markets for them is defence equipment. And the US DoD won't let you use an OS for their non-IT equipment (eg Radars) unless it is POSIX compliant... This has pretty much made it commercially impossible to do an RTOS without giving it POSIX]
Secondly Linux itself can be made into a pretty good real time OS by applying the PREMPT_RT patch set. Of all the RTOSes out there this is probably the best one at the moment from the point of view of making efficient use of all these multi core CPUs. However it's not quite such a hard-realtime OS as the others, so its quid pro quo.
RTAI takes a different approach of in effect placing their own RTOS underneath Linux and making Linux nothing more than one task running in their OS. This approach is ok up to a point, but the big penalty of RTAI is that the real time bit is now (as far as I can tell) not POSIX compliant (though the API looks like they've just stuck rt_ on the front of some POSIX function names) and interaction with other things is now, as you're discovering, quite complicated.
PREEMPT_RT is a much more intrusive patch set than RTAI, but the payback is that everything else (like POSIX and valgrind) stays completely normal. Plus nice things like FTrace are available. Book keeping is then a case of merely using existing tools, not having to write new ones. Also it looks like PREEMPT_RT is gradually worming its way into the mainstream Linux kernel anyway. That would render other patch sets like RTAI pretty much pointless.
So Linux + PREEMPT_RT gives us realtime POSIX plus a bunch of tools, just like all the other RTOSes out there; commonality across the board. Which kinda sounds like the goal of your project.
I apologise for not helping with the with the "how" of your project, and it is highly ungentlemanly of me to query the "why?" of it too. But I feel it is important to know that there are established things out there that seem to heavily overlap with what you're trying to do. Unseating King POSIX is going to be difficult.
I would like to answer this differently: you don't want to do this. There are good reasons why there is no interface to do this kind of thing and there are good reasons why all other kernel subsystems are designed and implemented to never need a lock shared between user and kernel space. The complexity of lock ordering and implicit locking in unexpected places will quickly get out of hand if you start playing around with userland that can prevent the kernel from doing certain things.
Let me recall a very long debugging session I did around 15 years ago to at least shed some light what complex problems you can run into. I was involved in developing a file system where the large portion of the code was in userland. Something like FUSE.
The kernel would do a filesystem operation, package it into a message and send it to the userland daemon and wait for a reply. The userland daemon reads the message, does stuff and writes a reply to the kernel which wakes up and continues with the operation. Simple concept.
One thing you need to understand about filesystems is locking. When you're looking up a name of a file, for example "foo/bar", the kernel somehow gets the node for the directory "foo" then locks it and asks it if it has the file "bar". The filesystem code somehow finds "bar", locks it and then unlocks "foo". The locking protocol is quite straight forward (unless you're doing a rename), parent always gets locked before the child and the child is locked before the parent lock is released. The lookup message for the file is what would get sent to our userland daemon while the directory was still locked, when the daemon replied the kernel would proceed to first lock "bar" and then unlock "foo".
I don't even remember the symptoms we were debugging, but I remember the issue was not trivially reproducible, it required hours and hours of filesystem torture programs until it manifested itself. But after a few weeks we figured out what was going on. Let's say that the full path to our file was "/a/b/c/foo/bar". We're in the process of doing a lookup on "bar", which means that we're holding the lock on "foo". The daemon is a normal userland process so some operations it does can block and can be preempted too. It's actually talking over the network so it can block for a long time. While we're waiting for the userland daemon some other process want to look up "foo" for some reason. To do this, it has the node for "c", locked of course, and asks it to look up "foo". It manages to find it and attempts to lock it (it has to be locked before we can release the lock on "c") and waits for the lock on "foo" to be released. Another process comes in an wants to look up "c", it of course ends up waiting for that lock while holding the lock on "b". Another process waits for "b" and holds "a". Yet another process wants "a" and holds the lock on "/".
This is not a problem, not yet. This sometimes happens in normal filesystems too, locks can cascade all the way up to the root, you wait for a while for a slow disk, the disk responds, the congestions eases up and everyone gets their locks and everything keeps running fine. In our case though, the reason for holding the lock a long time was because the remote server for our distributed filesystem didn't respond. X seconds later the userland daemon times out and just before responding to the kernel that the lookup operation on "bar" has failed it logs a message to syslog with a timestamp. One of the things that the timestamp needs is the timezone information, so it needs to open "/etc/localtime", of course to do that, it needs to start looking up "/etc" and for that it needs to lock "/". "/" is already locked by someone else, so the userland daemon waits for that someone else to unlock "/" while that someone else waits through a chain of 5 processes and locks for the daemon to respond. The system ends up in a total deadlock.
Now, maybe your code will not have problems like this. You're talking about a real-time system so there might be a level of control you have that normal kernels don't. But I'm not sure if adding an unexpected layer of locking complexity would even let you keep real time properties of the system, or really make sure that nothing you do in userland will ever create a deadlock cascade. If you don't page, if you never touch any file descriptor, if you never do memory operations and a bunch of other things I can't really think of right now you could get away with a lock shared between userland and kernel, but it will be hard and you'll probably find unexpected problems.
Multiple solutions exist in Linux/GLIBC but none permit to share explicitly a semaphore between user and kernel spaces.
The kernel provides solutions to suspend threads/processes and the most efficient is the futex. Here are some details about the state of the art of the current implementations to synchronize user space applications.
SYSV services
The Linux System V (SysV) semaphores are a legacy of the eponymous Unix OS. They are based on system calls to lock/unlock semaphores. The corresponding services are:
semget() to get an identifier
semop() to make operations on the semaphores (e.g. incrementation/decrementation)
semctl() to make some control operations on the semaphores (e.g. destruction)
The GLIBC (e.g. 2.31 version) does not provide any added value on top of those services. The library service directly calls the eponymous system call. For example, semop() (in sysdeps/unix/sysv/linux/semtimedop.c) directly invokes the corresponding system call:
int
__semtimedop (int semid, struct sembuf *sops, size_t nsops,
const struct timespec *timeout)
{
/* semtimedop wire-up syscall is not exported for 32-bit ABIs (they have
semtimedop_time64 instead with uses a 64-bit time_t). */
#if defined __ASSUME_DIRECT_SYSVIPC_SYSCALLS && defined __NR_semtimedop
return INLINE_SYSCALL_CALL (semtimedop, semid, sops, nsops, timeout);
#else
return INLINE_SYSCALL_CALL (ipc, IPCOP_semtimedop, semid,
SEMTIMEDOP_IPC_ARGS (nsops, sops, timeout));
#endif
}
weak_alias (__semtimedop, semtimedop)
Nowadays, SysV semaphores (as well as other SysV IPC like shared memory and message queues) are considered deprecated because as they need a system call for each operation, they slow down the calling processes with systematic context switches. New applications should use POSIX compliant services available through the GLIBC.
POSIX services
POSIX semaphores are based on Fast User Mutexes (FUTEX). The principle consists to increment/decrement the semaphore counter in user space with atomic operations as long as there is no contention. But when there is contention (multiple threads/processes want to "lock" the semaphore at the same time), a futex() system call is done to either wake up waiting threads/processes when the semaphore is "unlocked" or suspend threads/processes waiting for the semaphore to be released. From performance point of view, this makes a big difference compared to the above SysV services which systematically required a system call for any operation. The POSIX services are implemented in GLIBC for the user space part of the operations (atomic operations) with a switch into kernel space only when there is contention.
For example, in GLIBC 2.31, the service to lock a semaphore is located in nptl/sem_waitcommon.c. It checks the value of the semaphore to decrement it with an atomic operation (in __new_sem_wait_fast()) and invokes the futex() system call (in __new_sem_wait_slow()) to suspend the calling thread only if the semaphore was equal to 0 before the attempt to decrement it.
static int
__new_sem_wait_fast (struct new_sem *sem, int definitive_result)
{
[...]
uint64_t d = atomic_load_relaxed (&sem->data);
do
{
if ((d & SEM_VALUE_MASK) == 0)
break;
if (atomic_compare_exchange_weak_acquire (&sem->data, &d, d - 1))
return 0;
}
while (definitive_result);
return -1;
[...]
}
[...]
static int
__attribute__ ((noinline))
__new_sem_wait_slow (struct new_sem *sem, clockid_t clockid,
const struct timespec *abstime)
{
int err = 0;
[...]
uint64_t d = atomic_fetch_add_relaxed (&sem->data,
(uint64_t) 1 << SEM_NWAITERS_SHIFT);
pthread_cleanup_push (__sem_wait_cleanup, sem);
/* Wait for a token to be available. Retry until we can grab one. */
for (;;)
{
/* If there is no token available, sleep until there is. */
if ((d & SEM_VALUE_MASK) == 0)
{
err = do_futex_wait (sem, clockid, abstime);
[...]
The POSIX services based on the futex are for examples:
sem_init() to create a semaphore
sem_wait() to lock a semaphore
sem_post() to unlock a semaphore
sem_destroy() to destroy a semaphore
To manage mutex (i.e. binary semaphores), it is possible to use the pthread services. They are also based on the futex. For examples:
pthread_mutex_init() to create/initialize a mutex
pthread_mutex_lock/unlock() to lock/unlock a mutex
pthread_mutex_destroy() to destroy a mutex
I was thinking about ways that kernel and user land share things directly i.e. without syscall/copyin-out cost. One thing I remembered was the RDMA model where the kernel writes/reads directly from user space, with synchronization of course. You may want to explore that model and see if it works for your purpose.

GLib handle out of memory

I've a question concerning the GLib.
I would like to use the GLib in a server context but I'm not aware on how the memory is managed:
https://developer.gnome.org/glib/stable/glib-Memory-Allocation.html
If any call to allocate memory fails, the application is terminated. This also means that there is no need to check if the call succeeded.
If I look at the source code, if g_malloc failed, it will call g_error:
g_error()
define g_error(...)
A convenience function/macro to log an error message.
Error messages are always fatal, resulting in a call to abort() to terminate the application.[...]
But in my case, as I'm developing a server application, I don't want the application exit, I would prefer, as the traditional malloc function, the GLib functions returns NULL or something to indicate an error happened.
So, my question is, there is a manner to handle out of memory?
Is the GLib not recommended for server purpose applications?
If I look at the man of abort I can see that I can handle the signal but I'll make the management of out-of-memory errors a little bit painful...
The abort() function causes abnormal program termination to occur, unless
the signal SIGABRT is being caught and the signal handler does not
return.
Thanks for you help!
It's very difficult to recover from lack of memory. The reason for that is that it can be considered a terminal state, in the sense that lack of memory will persist for some time before it goes away. Even reacting to the lack of memory (like informing the user) might require more memory, for example, to build and send a message. A related problem is that there are operating systems (linux at least) that may be over optimistic about allocating memory. When the kernel realizes that memory is missing, it may kill the application, even if your code is handling the failures.
So either you have a much stricter grasp of your whole system than average, or you won't be able to successfully handle out of memory errors, and, in this case, it doesn't matter what the helper library is doing.
If you really want to control memory allocation while still using glib, you have partial ways to do that. Don't use any glib allocation function and use some from other library. Glib provides functions that receive a "free function" when necessary. For example:
https://developer.gnome.org/glib/2.31/glib-Hash-Tables.html#g-hash-table-new-full
The hash table constructor accepts functions for destroying both keys and values. In your case, the data will be allocated using custom allocation functions, while the hash data structures will be allocated with glib functions.
Alternatively you could use g_try_* macros to allocate memory, so you still use glib allocator, but it won't abort on error. Again, this only partially solves the problem. Internally, glib will implicitly call functions that may abort and it assumes it will never return on error.
About the general question: does it make sense for a server to crash when it's out of memory ? The obvious answer is no, but I can't estimate how theoretical this answer is. I can only expect that the server system be properly sized for its operation and reject as invalid any input that could potentially exceed its capacities, and for that, it doesn't matter which libraries it might use.
I'm probably editorializing a bit here, but the modern tendency to use virtual/logical memory (both names have been used, although "logical" is more distinct) does dramatically complicate knowing when memory is exhausted, although I think one can restore the old, real-(RAM + swap) model (I'll call this the physical model) in Linux with the following in /etc/sysctl.d/10-no-overcommit.conf:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100
This restores the ability to have the philosophy that if a program's malloc just failed, that program has a good chance of having been the actual cause of memory exhaustion, and can then back away from the construction of the current object, freeing memory along the way, possibly grumbling at the user for having asked for something crazy that needed too much RAM, and awaiting the next request. In this model, most OOM conditions resolve almost instantly - the program either copes and presumably returns RAM, or gets killed immediately on the following SEGV when it tries to use the 0 returned by malloc.
With the virtual/logical memory models that linux tends to default to in 2013, this doesn't work, since a program won't find memory isn't available at malloc, but instead upon attempting to access memory later at which point the kernel finally realizes there's nowhere in RAM for it. This amounts to disaster, since any program on the system can die, rather than the one the ran the host out of RAM. One can understand why some GLib folks don't even care about trying to fix this problem, because with the logical memory model, it can't be fixed.
The original point of logical memory was to allow huge programs using more than half the memory of the host to still be able to fork and exec supporting programs. It was typically enabled only on hosts with that particular usage pattern. Now in 2013 when a home workstation can have 24+ GiB of RAM, there's really no excuse to have logical memory enabled at all 99% of the time. It should probably be disabled by default on hosts with >4 GiB of RAM at boot.
Anyway. So if you want to take the old-school physical model approach, make sure your computer has it enabled, or there's no point to testing your malloc and realloc calls.
If you are in that model, remember that GLib wasn't really guided by the same philosophy (see http://code.google.com/p/chromium/issues/detail?id=51286#c27 for just how madly astray some of them are). Any library based on GLib might be infected with the same attitude as well. However, there may be some interesting things one can do with GLib in the physical memory model by emplacing your own memory handlers with g_mem_set_vtable(), since you might be able to poke around in program globals and reduce usage in a cache or the like to free up space, then retry the underlying malloc. However, that's limited in its own way by not knowing which object was under construction at the point your special handler is invoked.

Resources