Porting user space code to kernel space

Porting user space code to kernel space - c

I have a big system written mostly in C that was running in user space up till now. Now I need to compile the code as a kernel module. For that, afaik, I should at least rewrite the code and replace functions as malloc, calloc, free, printf with their kernel equivalents, because those are solely user-space functions. The problem is, however, that I don't have the source code to some custom-made libraries used in the system, and those libraries call malloc etc. inside their functions. So, basically, I might need to reimplement the whole library.
Now the question: will it be a really dirty hack, if I'd write my own implementation of malloc as a wrapper around kmalloc, something like this:
void *malloc(size_t size) {
return kmalloc(size, GFP_USER);
}
Then link this implementation to the system code, which will eliminate all the Unknown symbol in module errors.
Actually I thought that this would be a common problem and someone would have already written such a kmalloc wrapper, but I've been googling for a couple of days now and found nothing useful.
EDIT: The reason for doing this is that the system I'm talking about was a realtime application running on VxWorks realtime OS and now we want to port it to be used on Linux RTAI, where the apps mostly run in kernel space. But I guess there is a possibility to have real-time in user space as well, so, I should probably do as Mike suggested and separate the code into kernel and user-space parts and communicate between them with shared memory.

I've never seen this done before. I did have to do something similar at a previous job (in our phones, for power savings reasons, we had to port a portion of code from user space from the kernel) but that's how I did it.. I took a portion of the code and moved it, and a small portion at that.
When I did it I changed the user space calls to kernel calls because of a number of reasons two primary ones:
It was less confusing that way (others looking at the code didn't have to wonder why I was calling "malloc" from the kernel)
malloc and kmalloc don't work exactly the same. What I mean by that is
2a. kmalloc takes a flags parameter, in your example above you hardcoded it. What if you decide later that you want to change it in some places and not others? (assuming you have a number of different places where you get dynamic memory).
2b. kmalloc doesn't give you memory in the same way as malloc. malloc() will give you the number of bytes you pass in as size_t size. kmalloc() on the other hand, is in the kernel and thus is dealing with the physical memory of the system, which is available only in page-sized chunks; thus when you call kmalloc() you are going to get only certain predefined, fixed-size byte arrays. if you're not aware of this, you might ask for just over a particular chunk and thus get much more memory than you need... a direct port of your code won't protect you from that.
2c. The header files have to change too. Obviously you can't include <stdlib.h> in the kernel, so just because you "wrapped" the malloc call, you still have to go around replacing header files.
quick example of my point in 2b above:
void * stuff;
stuff = kmalloc(1,GFP_KERNEL);
printk("I got: %zu bytes of memory\n", ksize(stuff));
kfree(stuff);
To show the actual amount of memory allocated:
[90144.702588] I got: 32 bytes of memory
anyway... technically, how you describe it, should work fine. Both take a size_t and return a void * so it should work; but be aware that the more code you move into the kernel the less deterministic things become, and that malloc()<=>kmalloc() isn't as 1:1 as it seems.

Trying to make my RTAI code compilable in both user and kernel spaces (as well as working with POSIX), I have developed URT which essentially does what you are asking. It's a lightweight abstraction level over real-time systems (and even over the inconsistent user-space vs kernel-space RTAI functions).

Related

Custom handling of memory reads and writes in C

I am working on writing my own malloc and using the LD_PRELOAD trick to use it. I need to be able to perform custom functionality for every memory access to the heap, both reads and writes (performance is not a concern, functionality is the goal).
For example, for some code like
int x = A[5];
I would like to be able to trap the read from (A + 5) and instead of reading from that memory location, return my own custom value to store in x.
The ideas I have as of now are:
mprotect away, handling the resulting SIGSEGVs and doing what I need to in the handler. As far as I know, I can access the faulty address in void *si_addr, but I'm not sure how to distinguish between a read and a write - and even if I did manage to do so, I'm not sure how to handle writes since I wouldn't know the value to be written within the handler.
Tweak gcc to handle memory accesses specially. From what I have read, understanding gcc code takes a while, and unless its IR/abstract assembly conveniently isolates memory loads/stores, I'm not sure how practical this is.
Any suggestions are appreciated.

The simplest is via malloc ( you might want to own mmap, munmap, mprotect, sig(action, nal, etc) ... for full coverage ). Yours return addresses which do not correspond to valid mappings, capture SIGBUS + SIGSEGV, interpret the siginfo structure to fixup your process, ... But this is somewhat limited to operating on the heap, and a program can readily escape from it, and if you are trying to catch a misbehaving program, the program might corrupt your lookup tables.
For fuller coverage, you might want to take a look at gvisor, which is billed as a container runtime sandbox. Its technology is closer to a debugger, as it takes full control over the target, capturing its faults, system calls, etc.. and manages its address space. It would likely be minor surgery to adapt it to your needs.
In either situation, when you take a fault, you have to either install the memory and restart the program or emulate the instruction. If you are dealing with a clean architecture like riscv or ARM, emulation isn’t too bad, but for an over-indulgent one like x86, you pretty much need to integrate qemu. If you take the gvisor-like approach, you can install the page and set the single-step flag, then remove the page on the single-step trap, which is a bit less cumbersome. There was a precursor to dtrace, called atrace, that used this approach to analyze cache and tlb access patterns.
Sounds like a fun project; I hope it goes well.

Memory allocation that resizes a buffer ONLY if it can grow in place?

After reading the man-page for realloc(), I came to the realization that it works a little differently than I thought it did. I originally thought that realloc() would attempt to resize a buffer, previously allocated with one of the malloc-family functions, and if it could NOT extend the buffer in place, then it would fail. However, the man-page states:
The realloc() function returns a pointer to the newly allocated memory, which is suitably aligned for any built-in type and may be different from ptr, or NULL if the request fails.
The "may be different from ptr" part is what I'm talking about.
Basically, what I want is a function, similar to realloc(), but which fails if it cannot extend the buffer in place. It seems that there is no function in the standard C library that does this; however, I'm assuming there may be some OS-specific functions that accomplish the same thing.
Could someone tell me what functions are out there that do what I described above, and which OS's they are specific to? Preferably, I'd like to know at least the functions specific to Linux and Windows (and Mac OS would be a nice bonus too :) ).
This may be a duplicate of this post, but I don't think it is for the following reasons:
The question in the post I linked to simply asks, is there a function that extends a buffer in place, whereas, I'm asking, which functions extend a buffer in place.
The accepted answer for that post does not contain the information I need.
EDIT
Some people were wondering what is the use case I need this for, so I'll explain, below:
I'm writing a C preprocessor (yes, I know... don't reinvent the wheel... well, I'm doing it anyways, so there). And one component of the C preprocessor is a cache for storing pp-tokens which come from various source files, where each source file's set of pp-tokens may be fragmented within the cache. The cache itself, is a linked-list of large chunks of memory. Ideally, I'd like to keep this linked-list short, hence why I'd like to first try resizing the buffer (in place); however, if resizing in place is not possible, then I want to just add another node (i.e. chunk of memory) to the linked list.
Within each cache buffer, there are additional linked-list nodes, which provide a means for iterating through all the pp-tokens of each individual source file, which may be fragmented across the various cache buffers that make up the cache.
The reasons I need the kind of memory reallocation I discussed earlier are the following:
If resizing a cache buffer could not be done in place, and a new buffer had to be allocated and the old memory contents copied, then I'd have a lot of dangling pointers. Jonathan Leffler suggested that I instead store offsets within the buffer, rather than pointers, which I had not even thought about, and is a great idea! However, reason #2...
I want the implementation of the cache to be as fast as possible, and, please correct me if I'm wrong, but it seems to me that (for my use case) it would be faster on average to just add a new cache buffer to the linked list if a given cache buffer could not be resized in place, rather than allocating a new buffer and copying all previous contents and freeing the old buffer. As a sidenote, I am planning on doubling the size of the allocated cache buffer each time cache resizing is needed.

Memory management (in the form of malloc and friends) is generally implemented as a library; it is not part of the Operating System. (An implementation of the library will probably need to use some OS facilities to acquire raw memory -- although that's not a given -- but there is no need to involve the OS for allocating and freeing individual allocations.) So you're not going to find an "OS-specific" solution.
There are a number of different memory allocation libraries available. If you decide to use an alternative to the one preinstalled with your particular distribution, you will probably want to arrange for it to be used by the standard library as well. Details for how to do that vary.
Most allocation libraries do include some additional interfaces, but I don't know of any library which offers the function you're looking for. More common is an API for finding out how much memory is actually in an allocation (which is often more than the amount requested by the malloc). For many libraries, realloc will only expand the allocation in place if it was already big enough, but there may be libraries which are willing to merge a following free block in order to make non-copying realloc possible.
There's a list of some commonly-used libraries in the Wikipedia page on dynamic memory allocation, which also has a good overview of implementation techniques.
And, of course, you could always write your own memory manager (or modify an open source library) to implement that feature. However, while that would be an interesting and satisfying project, I'd strongly suggest you think about (and research) the reasons why this seemingly simple idea has not been implemented in common memory management libraries. There are good reasons.

When is it more appropriate to use valloc() as opposed to malloc()?

C (and C++) include a family of dynamic memory allocation functions, most of which are intuitively named and easy to explain to a programmer with a basic understanding of memory. malloc() simply allocates memory, while calloc() allocates some memory and clears it eagerly. There are also realloc() and free(), which are pretty self-explanatory.
The manpage for malloc() also mentions valloc(), which allocates (size) bytes aligned to the page border.
Unfortunately, my background isn't thorough enough in low-level intricacies; what are the implications of allocating and using page border-aligned memory, and when is this appropriate as opposed to regular malloc() or calloc()?

The manpage for valloc contains an important note:
The function valloc() appeared in 3.0BSD. It is documented as being obsolete in 4.3BSD, and as legacy in SUSv2. It does not appear in POSIX.1-2001.
valloc is obsolete and nonstandard - to answer your question, it would never be appropriate to use in new code.
While there are some reasons to want to allocate aligned memory - this question lists a few good ones - it is usually better to let the memory allocator figure out which bit of memory to give you. If you are certain that you need your freshly-allocated memory aligned to something, use aligned_alloc (C11) or posix_memalign (POSIX) instead.

Allocations with page alignment usually are not done for speed - they're because you want to take advantage of some feature of your processor's MMU, which typically works with page granularity.
One example is if you want to use mprotect(2) to change the access rights on that memory. Suppose, for instance, that you want to store some data in a chunk of memory, and then make it read only, so that any buggy part of your program that tries to write there will trigger a segfault. Since mprotect(2) can only change permissions page by page (since this is what the underlying CPU hardware can enforce), the block where you store your data had better be page aligned, and its size had better be a multiple of the page size. Otherwise the area you set read-only might include other, unrelated data that still needs to be written.
Or, perhaps you are going to generate some executable code in memory and then want to execute it later. Memory you allocate by default probably isn't set to allow code execution, so you'll have to use mprotect to give it execute permission. Again, this has to be done with page granularity.
Another example is if you want to allocate memory now, but might want to mmap something on top of it later.
So in general, a need for page-aligned memory would relate to some fairly low-level application, often involving something system-specific. If you needed it, you'd know. (And as mentioned, you should allocate it not with valloc, but using posix_memalign, or perhaps an anonymous mmap.)

First of all valloc is obsolete, and memalignshould be used instead.
Second thing it's not part of the C (C++) standard at all.
It's a special allocation which is aligned to _SC_PAGESIZE boundary.
When is it useful to use it? I guess never, unless you have some specific low level requirement. If you would need it, you would know to need it, since it's rarely useful (maybe just when trying some micro-optimizations or creating shared memory between processes).

The self-evident answer is that it is appropriate to use valloc when malloc is unsuitable (less efficient) for the application (virtual) memory usage pattern and valloc is better suited (more efficient). This will depend on the OS and libraries and architecture and application...
malloc traditionally allocated real memory from freed memory if available and by increasing the brk point if not, in which case it is cleared by the OS for security reasons.
calloc in a dumb implementation does a malloc and then (re)clears the memory, while a smart implementation would avoid reclearing newly allocated memory that is automatically cleared by the operating system.
valloc relates to virtual memory. In a virtual memory system using the file system, you can allocate a large amount of memory or filespace/swapspace, even more than physical memory, and it will be swapped in by pages so alignment is a factor. In Unix creation of file of a specified file and adding/deleting pages is done using inodes to define the file but doesn't deal with actual disk blocks till needed, in which case it creates them cleared. So I would expect a valloc system to increase the size of the data segment swap without actually allocating physical or swap pages, or running a for loop to clear it all - as the file and paging system does that as needed. Thus valloc should be a heck of a lot faster than malloc. But as with calloc, how particular idiotsyncratic *x/C flavours do it is up to them, and the valloc man page is totally unhelpful about these expectations.
Traditionally this was implemented with brk/sbrk. Of course in a virtual memory system, whether a paged or a segmented system, there is no real need for any of this brk/sbrk stuff and it is enough to simply write the last location in a file or address space to extend up to that point.
Re the allocation to page boundaries, that is not usually something the user wants or needs, but rather is usually something the system wants or needs.
A (probably more expensive) way to simulate valloc is to determine the page boundary and then call aligned_alloc or posix_memalign with this alignment spec.
The fact that valloc is deprecated or has been removed or is not required in some OS' doesn't mean that it isn't still useful and required for best efficiency in others. If it has been deprecated or removed, one would hope that there are replacements that are as efficient (but I wouldn't bet on it, and might, indeed have, written my own malloc replacement).
Over the last 40 years the tradeoffs of real and (once invented) virtual memory have changed periodically, and mainstream OS has tended to go for frills rather than efficiency, with programmers who don't have (time or space) efficiency as a major imperative. In the embedded systems, efficiency is more critical, but even there efficiency is often not well supported by the standard OS and/or tools. But when in doubt, you can roll your own malloc replacement for your application that does what you need, rather than depend on what someone else woke up and decided to do/implement, or to undo/deprecate.
So the real answer is you don't necessarily want to use valloc or malloc or calloc or any of the replacements your current subversion of an OS provides.

What happens when different object files use different malloc implementations

I have a couple of questions.
Suppose a program is compiled using 2 object files. Each uses malloc and free in most of their functions. But these object files were generated at different times and happen to be using different malloc implementations. Let's say the implementations share variable names and function names. Will the program work fine or not? Why?
If a program has object file 1 and 2, code from object file 1 call malloc and allocates some memory then frees it. Now code from object file 2 calls malloc. Can it use the memory that was freed? How does it work underneath?

Trying to provide a useful answer, even though it's far from complete.
Part 1.
First, it's hard enough to link the program with two implementations of malloc sharing function names: duplicate definitions usually cause linker errors. I can see how we manage to do it using GNU binutils, and there probably are some equivalent tricks for other toolchains. For the rest of the answer, let's assume we managed to link two implementations successfully. (It's usually a good thing that you get linker errors instead of mixing two implementations, possibly even introducing malloc/free asymmetry which has almost no chance to work).
Let's also assume that memory allocated with one particular implementation is always freed using free from the same implementation. Otherwise, it's virtually guaranteed to fail.
Two implementations may work together, or they may interfere, depending on how they request more memory from the OS when their local heaps run out of space. MS Windows has a system interface for managing heaps, and two different mallocs are likely to be built on top of it; then nothing prevents them from working together. Implementations requesting memory with sbrk-like call will work together if they're both ready that someone else will request sbrk increase independently of malloc. I'd expect that malloc from glibc won't fail here, but I'm not really sure.
Part 2.
If the implementation used by object 1 is able to return memory to OS, memory can be reused by the implementation called by object 2. That is, memory reuse may happen but it's less likely than when a single implementation is used.
The possibility of returning memory to OS depends on malloc/free implementation, and may also depend on allocated chunk size and various system settings. For example, glibc uses anonymous mmap for large chunks of memory, and these chunks are unmapped when freed.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?

Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.

It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.

Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.

You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?

I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.

A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!

If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).

It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.

Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.

I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.

A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.

My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.

As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...

I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).

I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight