Allocating executable memory using kmalloc - c

The answers to the question How to allocate an executable page in a Linux kernel module? describe how executable memory can be allocated using __vmalloc(). Is this also possible using kmalloc()? My goal is having a physically-contiguous executable memory area.

It does not have exec permissions. I tried it, and dmesg shows "kernel tried to execute NX-protected page - exploit attempt? (uid: 0)"
Then no, I'd assume you can't kmalloc executable memory. Unless I'm wrong about how it works (returning pointers into an existing mapping that uses 1GB hugepages to cover all of physical RAM) it's just plain incompatible with the purpose / design of kmalloc.
There might be something other than vmalloc that you could use, if you really need more than 1 physically-contiguous 4k page of executable memory, but I don't know what it is. (I'm not a kernel dev, I just know a little bit about the big picture, and lots about CPU architecture / x86). Perhaps something like vmalloc and then changing the page tables?
Other answers welcome.

Related

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Does a compiler have consider the kernel memory space when laying out memory?

I'm trying to reconcile a few concepts.
I know of virtual memory is shared (mapped) between the kernel and all user processes, which I read here. I also know that when the compiler generates addresses for code + data, the kernel must load them at the correct virtual addresses for that process.
To constrain the scope of the question, I'll just mean gcc when I mention 'the compiler'.
So does the compiler need to be compliant each new release of an OS, to know not to place code or data at the high memory addresses reserved for the kernel? As in, someone writing that piece of the compiler must know those details of how the kernel plans to load the program (lest the compiler put executable code in high memory)?
Or am I confusing different concepts? I got a bit confused when going through this tutorial, especially at the very bottom where it has OS code in low memory addresses, because I thought Linux uses high memory for the kernel.
The compiler doesn't determine the address ranges in memory at which things are placed. That's handled by the OS.
When the program is first executed, the loader places the various portions of the program and its libraries in memory. For memory that's allocated dynamically, large chunks are allocated from the OS and then sometimes divided into smaller chunks.
The OS loader knows where to load things. And the OS's virtual memory allocation logic how to find safe, empty spaces in the address space the process uses.
I'm not sure what you mean by the "high memory addresses reserved for the kernel". If you're talking about a 2G/2G or 3G/1G split on a 32-bit operating system, that is a fundamental design element of those OSes that use it. It doesn't change with versions.
If you're talking about high physical memory, then no. Compilers don't care about physical memory.
Linux gives each application its own memory space, distinct from the kernel. The page table contains the translations between this memory space and physical RAM, and the kernel sets up the page table so there's no interference.
That said, the compiler usually doesn't even care where the program is loaded in memory. Why would it?

System malloc vs DLMalloc on large malloc

I haven't coded in a while, so excuse me upfront. I have this odd problem. I am trying to malloc 8GB in one go and I plan to manage that heap with TLSF later on. That is, I want to avoid mallocing throughout my application at all, just get one big glob at the beginning and freeing it in the end. Here is the peculiarity though. I was always using dlmalloc until now in my programs. Linking it in and everything went well. However, now when I try to malloc 8GB at once and link in dlmalloc to use it I get segmentation fault 11 on OSX when I run it, without dlmalloc everything goes well. Doesn't matter if I use either gcc or clang. System doesn't have 8GB of RAM though, but it has 4GB. Interestingly enough same thing happens on Windows machine which has 32GB of RAM and Ubuntu one that has 16GB of RAM. With system malloc it all works, allocation goes through and simple iteration through allocated memory works as expected on all three systems. But, when I link in dlmalloc it fails. Tried it both with malloc and dlmalloc function calls.
Allocation itself is nothing extraordinary, plain c99.
[...]
size_t bytes = 1024LL*1024LL*1024LL*8LL;
unsigned long *m = (unsigned long*)malloc(bytes);
[...]
I'm confused by several things here. How come system malloc gives me 8GB malloc even without system having 4GB or RAM, are those virtual pages? Why dlmalloc doesn't do the same? I am aware there might not be a continuos block of 8GB of RAM to allocate, but why segmentation fault then, why not a null ptr?
Is there a viable robust (hopefully platform neutral) solution to get that amount of RAM in one go from malloc even if I'm not sure system will have that much RAM?
edit: program is 64-bit as are OS' which I'm running on.
edit2:
So I played with it some more. Turns out if I break down allocation into 1GB chunks, that is 8 separate mallocs, then it works with dlmalloc. So it seems to be an issue with contiguous range allocation where dlmalloc probably tries to allocate only if there is a contiguous block. This makes my question then even harder to formulate. Is there a somewhat sure way to get that size of a memory chunk with or without dlmalloc across platforms, and not have it fail if there is no physical memory left (can be in swap, as long as it doesn't fail). Also would it be possible in a cross platform manner to tell if malloc is in ram or swap.
I will give you just a bit of perspective, if not an outright answer. When I see you attempting to allocate 8GB of contiguous RAM, I cringe. Yes, with 64-bit computing and all, that is probably "legal", but on a normal machine, you are probably going to run into a lot of edge cases, 32-bit legacy code choking on a 64-bit size, and just plain usability issues getting a chunk of memory big enough to make this work. If you want to try this sort of thing, perhaps attempt to malloc the single chunk, then if that fails, use smaller chunks. This somewhat defeats the purpose of a 1 chunk system though. Perhaps there is some sort of "page size" in the OS that you could link your malloc size to - in order to help performance and just plain ability to get memory in the amount you wish.
On game consoles, this approach to memory management is somewhat common - allocate 1 buffer from the OS at bootup as big as possible, then place your own memory manager on there to avoid OS overhead and possible inferior allocation code. It also allows one to better control memory fragmentation on such systems where virtual memory doesn't exist. But on these systems, you also know up front exactly how much RAM you have.
Is there a way to see if memory is physical or virtual in a platform independent way? I don't think so, but perhaps someone else can give a good answer to that and I'll edit this part away.
So not a 100% answer, but some random thoughts to help out and my internally wondering what you are doing that wants 8GB of RAM in one chunk when it sounds like multiple chunks will work fine. :)

Read struct from physical memory address in C

This is probably more of a problem with my lack of C knowledge, but I'm hoping someone might be able to offer a possible solution. In a nutshell, I'm trying to read a struct that is stored in memory, and I have it's physical memory address. Also this is being done on a 64-bit Linux system (Debian (Wheezy) Kernel 3.6.6), and I'd like to use C as the language.
For example the current address of the struct in question is at physical address: 0x3f5e16000
Now I did initially try to access this address by using using a pointer to /dev/mem. However, I've since learned that access to any address > 1024MB is not allowed, and I get a nice error message in var/log/messages telling me all about it. At present access is being attempted from a userspace app, but I'm more than happy to look into writing a kernel module, if that is what is required.
Interesting, I've also discovered something known as 'kprobe', which supposedly allows the > 1024MB /dev/mem restriction to be bypassed. However, I don't really want to introduce any potential security issues into my system, and I'm sure there must be an easier way to accomplish this. The info on kprobe can be found here: http://www.libcrack.so/2012/09/02/bypassing-devmem_is_allowed-with-kprobes/
I've done some reading and I've found references to using mmap to map the physical address into userspace so that it can be read, but I must confess that I don't understand the implementation of this in C.
If anyone could provide some information on accessing physical memory, or either mapping data from a physical address to a userspace virtual address, I would be extremely grateful.
You'll have to forgive me if I'm a little bit vague as to exactly what I'm doing, but it's part of a project and I don't want to give too much information away, so please bear with me :) I'm not being obtuse or anything.
The structure in memory is a block of four ints and ten longs that is loaded into memory by a running kernel module.
The address that I'm using is definitely a physical address and it's set to non-paged, the kernel module performs the translations to physical and I'm not using the address-of operator.
I'm wondering if I should just rephrase the question as how to read an int from a physical location, as that is the first element of the struct. I hope that helps to clarify things!
EDIT - After doing some more reading, it appears that one possible solution to this problem is to construct a kernel module, and then use the mmap function to map the physical address to a virtual address the kernel module can then access. Can anyone offer any advice on achieving this using mmap?
I'm only going to answer this question:
I'm wondering if I should just rephrase the question as how to read an int from a physical location, as that is the first element of the struct.
No. The problem is not int vs. struct, the problem is that C in and of itself has no notion of physical memory. The OS in conjunction with the MMU makes sure that every process, including every running C program, runs in a virtual memory sandbox. The OS might offer an escape hatch into physical memory.
If you're writing a kernel module that manages some object at physical address 0x3f5e16000, then you should offer some API to get to that memory, preferably one that uses a file descriptor or some other abstraction to hide the nitty-gritty of kernel memory management from the user program it communicates with.
If you're trying to communicate with a poorly designed kernel module that expects you to access a fixed physical memory address, then ugly hacks involving /dev/mem are your share.

Determine Stack bottom, start and end of data segment of C program

I am trying to understand how memory space is allocated for a C program. For that , I want to determine stack and data segment boundaries. Is there any library call or system call which does this job ? I found that stack bottom can be determined by reading /proc/self/stat. However, I could not find how to do it. Please help. :)
Processes don't have a single "data segment" anymore. They have a bunch of mappings of memory into their address space. Common cases are:
Shared library or executable code or rodata, mapped shared, without write access.
Glibc heap segments, anonymous segments mapped with rw permissions.
Thread stack areas. They look a lot like heap segments, but are usually separated from each other with some unmapped guard pages.
As Nikolai points out, you can look at the list of these with the pmap tool.
Look into /proc/<pid>/maps and /proc/<pid>/smaps (assuming Linux). Also pmap <pid>.
There is no general method for doing this. In fact, some of the secure computing environments randomize the exact address space allocations and order so that code injection attacks are more challenging to engineer.
However, every C runtime library has to arrange the contributions of data and stack segments so the program works correctly. Reading the runtime startup code is the most direct way of finding the answer.
Which C compiler are you interested in?

Resources