accessing physical memory from linux kernel - c

Can we access any physical memory via some kernel code.? Because, i wrote a device driver which only had init_module and exit_module.. the code is following.
int init_module(void) {
unsigned char *p = (unsigned char*)(0x10);
printk( KERN_INFO "I got %u \n", *p);
return 0;
}
and a dummy exit_module.. the problem is the computer gets hung when i do lsmod..
What happens? Should i get some kinda permission to access the mem location?
kindly explain.. I'm a beginner!

To access real physical memory you should use phys_to_virt function. In case it is io memory (e.g. PCI memory) you should have a closer look at ioremap.
This whole topic is very complex, if you are a beginner I would suggest some kernel/driver development books/doc.

I suggest reading the chapter about memory in this book:
http://lwn.net/Kernel/LDD3/
It's available online for free. Good stuff!

Inside the kernel, memory is still mapped virtually, just not the same way as in userspace.
The chances are that 0x10 is in a guard page or something, to catch null pointers, so it generates an unhandled page fault in the kernel when you touch it.
Normally this causes an OOPS not a hang (but it can be configured to cause a panic). OOPS is an unexpected kernel condition which can be recovered from in some cases, and does not necessarily bring down the whole system. Normally it kills the task (in this case, insmod)
Did you do this on a desktop Linux system with a GUI loaded? I recommend that you set up a Linux VM (Vmware, virtualbox etc) with a simple (i.e. quick to reboot) text-based distribution if you want to hack around with the kernel. You're going to crash it a bit and you want it to reboot as quickly as possible. Also by using a text-based distribution, it is easier to see kernel crash messages (Oops or panic)

Related

Linux PCI Driver Setup and Teardown

After looking at the kernel docs here: https://www.kernel.org/doc/Documentation/PCI/pci.txt I am lost as to the ordering of function calls to set up and tear down a PCI driver.
I have two questions:
For setup, does pci_enable_device() always come before
pci_request_regions()? The documentation seems to point to this
fact, but does state:
OS BUG: we don't check resource allocations before enabling those
resources. The sequence would make more sense if we called
pci_request_resources() before calling pci_enable_device().
Currently, the device drivers can't detect the bug when when two
devices have been allocated the same range. This is not a common
problem and unlikely to get fixed soon. This has been discussed before but not changed as of 2.6.19: http://lkml.org/lkml/2006/3/2/194
However, after doing a quick look through of the source code of several
drivers, the consensus is that pci_enable_device() always comes
first. Which one of these calls is supposed to come first and why?
For tearing down the driver, I get even more confused. Assuming pci_enable_device() comes first, I would expect that you first call pci_release_regions() prior to calling pci_disable_device() (i.e., following some symmetry). However, the kernel docs say that pci_release_regions() should come last. What makes matters more complicated is that I looked at many drivers and almost all of them had pci_release_regions() before pci_disable_device(), like I would expect. However, I then stumbled across this driver: https://elixir.bootlin.com/linux/v4.12/source/drivers/infiniband/hw/hfi1/pcie.c (code is reproduced below).
void hfi1_pcie_cleanup(struct pci_dev *pdev)
{
pci_disable_device(pdev);
/*
* Release regions should be called after the disable. OK to
* call if request regions has not been called or failed.
*/
pci_release_regions(pdev);
}
Which function is supposed to come first when tearing down the driver? It seems that drivers in the kernel itself can't agree.
The statement that gives a final say is as follows :
o wake up the device if it was in suspended state,
o allocate I/O and memory regions of the device (if BIOS did not),
o allocate an IRQ (if BIOS did not).
So, it makes no sense to ask kernel to reserve resource if there is none. In most cases when we do not need to allocate the resource because it has been done by bios, in those cases we can keep either function first but do only if you are absolutely sure.

Are other parts of physical memory accessed during a segfault?

As part of a learning project, I've worked a bit on Spectre and Meltdown PoCs to get myself more confortable with the concept. I have managed to recover previously accessed data using the clock timers, but now I'm wondering how do they actually read physical memory from that point.
Which leads to my question : in a lot of Spectre v1\v2 examples, you can read this piece of toy-code example:
if (x<y) {
z = array[x];
}
with x supposedly being equal to : attacked_adress - adress_of_array, which will effectively lead to z getting the value at attacked_adress.
In the example it's quite easy to understand, but in reality how do they even know what attacked_adress looks like ?
Is it a virtual address with an offset, or a physical address, and how do they manage to find where is the "important memory" located in the first place ?
In the example it's quite easy to understand, but in reality how do they even know what attacked_adress looks like ?
You are right, Spectre and Meltdown are just possibilities, not a ready-to-use attacks. If you know an address to attack from other sources, Spectre and Meltdown are the way to get the data even using a browser.
Is it a virtual address with an offset, or a physical address, and how do they manage to find where is the "important memory" located in the first place ?
Sure, it is a virtual address, since it is all happening in user space program. But prior to the recent kernel patches, we had a full kernel space mapped into each user space process. That was made to speedup system calls, i.e. do just a privilege context switch and not a process context switch for each syscall.
So, due to that design and Meltdown, it is possible to read kernel space from unprivileged user space application (for example, browser) on unpatched kernels.
In general, the easiest attack scenario is to target machines with old kernels, which does not use address randomization, i.e. kernel symbols are at the same place on any machine running the specific kernel version. Basically, we run specific kernel on a test machine, write down the "important memory addresses" and then just run the attack on a victims machine using those addresses.
Have a look at my Specter-Based Meltdown PoC (i.e. 2-in-1): https://github.com/berestovskyy/spectre-meltdown
It is much simpler and easier to understand than the original code from the Specre paper. And it has just 99 lines in C (including comments).
It uses the described above technique, i.e. for Linux 3.13 it simply tries to read the predefined address 0xffffffff81800040 which is linux_proc_banner symbol located in kernel space. It runs without any privileges on different machines with kernel 3.13 and successfully reads kernel space on each machine.
It is harmless, but just a tiny working PoC.

Will direct accessing of user space address instead of copy_to_user work?

The following is an excerpt from my simple driver code.
int vprobe_ioctl( struct file *filep, unsigned int cmd, void *UserInp)
{
case IOCTL_GET_MAX_PORTS:
*(int*)UserInp = TotalPorts;
#if ENABLED_DEBUG
printk("Available port :%u \n ", TotalPorts);
#endif
break;
}
I was not aware about the function copy_to_user which should be used while writing on user space memory. The code directly accesses the user address. But still I am not getting any kernel crash in my development system(x86_64 architecture). It works as expected.
But sometimes I could see kernel crash when I insert the .ko file in some other x86_64 machines. So, I replaced direct accessing with copy_to_user, and it works.
Could anyone please explain,
i) How direct accessing of user address works?
ii) Why am I seeing kernel crash in some systems whereas it works well in some other systems. Is there any kernel configuration mismatch between the systems because of which the kernel could access the user process's virtual address directly?
Note : All the systems I have used have same OS and kernel.-same image generated thru kickstart. - There is no possibility of any differences.
Thanks in advance.
would be interesting to see the crash. now what I'm saying is an assumption based on my knowledge about how the memory works.
user space memory is virtual. it means that the specific process address X is now located on some physical memory, this physical memory is a memory page that is currently allocated to your process. copy to user first checks that the memory given really belongs to the process and other security checks. beside that there is mapping issues.
the kernel memory has its own address space that need to map virtual to physical address. the kernel use the help of mmu (this is different per architecture). In x86 the mapping between the kernel virtual and user virtual is 1:1 (there are different issues here). In other system this is not always true.

Why is my write to virtual memory not visible in Virtual Device Driver?

I have a custom driver that I've written meant to facilitate a custom mapping of exact hardware ram memory addresses into user land. I am trying to test that common memory mmap'd as shared between two processes to the same hardware address facilitates visible memory operations that each side can see.
My code is approximately something like this:
//placement: in a mmap callback to a file_operations facilitated
// character device
//phys_addr - a variable that I will ioremap for a virtual addr
virtaddr = ioremap(phys_addr, size);
if (!virtaddr) {
printk(KERN_INFO "could not remap page!");
goto out;
} else {
printk(KERN_INFO "attempting write");
*((int *)virtaddr) = 0xdeadbeef;
//wmb(); <--- I haven't tried this yet
}
As it so turns out, I thought maybe the issue was the lack of a write barrier to force the cache to flush to ram. I have to boot the test on some special hardware due to OS specifics that are outside the scope of this question. I don't think that write barriers apply to main memory or ram quite like it does for device registers or device memory (ex: cache on a SSD or something). So, I haven't tested wmb, but I just wanted to get my question out there. I've searched around some as well through the Linux Device Drivers 3 book, and I've executed my code; the fragment from which I am pulling is in fact executing and I know it because I can see the printk. The driver executes the code, but then just appears to keep on going. Lastly, there's an analogous piece of code that performs on ioremap on a common piece of hardware memory, which it then tries to read from. That read doesn't contain the value that I wrote to it.
Why?
Can you please tell exactly what you mean by this statement "hardware ram memory addresses into user land".
What type of device you are simulating [PCIe, USB etc]
This all depends upon your CPU routing and as hardware is not connected then the translation will not cause fault instead it will send data over bus protocol which will just like fake packed generation from bus controller to device.
To verify you can check bus transactions and in case of IO port mapping you can check using signals coming from specific port address/bits.

How to emulate memory-mapped I/O

I have some hardware that i want to emulate; i wonder if i can do it at a low level like this. The hardware has many registers, which i arrange in a struct:
#include <stdint.h>
struct MyControlStruct
{
uint32_t data_reg_1;
uint32_t data_reg_2;
uint32_t dummy[2]; // to make the following registers have certain addresses
uint32_t control_reg_1;
uint32_t control_reg_2;
};
volatile struct MyControlStruct* MyDevice = (struct MyControlStruct*)0xDeadF00;
So, i want to support the following syntax for hardware access on Windows and linux:
MyDevice->data_reg_1 = 42;
MyDevice->data_reg_2 = 100;
MyDevice->control_reg_1 = 1;
When the last line of code is executed, i want the hardware emulator to "wake up" and do some stuff. Can i implement this on Windows and/or linux? I thought about somehow catching the "segmentation fault" signal, but not sure whether this can be done on Windows, or at all.
I looked at the manual page of mmap; it seems like it can help, but i couldn't understand how i can use it.
Of course, i could abstract the access to hardware by defining functions like WriteToMyDevice, and everything would be easy (maybe), but i want to understand if i can arrange access to my hardware in this exact way.
In principle, you could code (unportably) a handler for SIGSEGV which would trap and handle access to unwanted pages, and which could check that a specified address is accessed.
To do that under Linux, you'll need to use the sigaction system call with SA_SIGINFO and use the ucontext_t* third argument of your signal handler.
This is extremely unportable: you'll have to code differently for different Unixes (perhaps even the version number of your Linux kernel could matter) and when changing processors.
And I've heard that Linux kernels are not very quick on such handling.
Other better kernels (Hurd, Plan9) offer user-level pagination, which should help.
I initially misunderstand your question. You have a piece of memory mapped hardware and you want your emulation to be binary compatible. On Windows you could allocate the memory for the structure using VirtualAlloc and make it a guard page and catch any access to it using SEH.
In actuality your emulator is (rather crudely) possible on linux with pure user space code.
To build the emulator, simply have a second thread or process (using shared memory, or perhaps an mmap'd file and inotify) watching the memory which is emulating the memory mapped device
For the real hardware driver, you will need a tiny bit of kernel code, but that could simply be something that maps the actual hardware addresses into user space with appropriate permissions. In effect this regresses a modern multiuser operating environment down to acting like an old dos box or a simple micro-controller - not great practice, but workable at least where security is not a concern.
Another thing you could consider would be running the code in a virtual machine.
If the code you will be exercising is your own, it's probably better to write it in a portable manner to begin with, abstracting out the hardware access into functions that you can re-write for each platform (ie, OS, hardware version or physical/emulated). These techniques are more useful if it's someone else's existing code you need to create an environment for. Another thing you can consider (if the original isn't too tightly integrated) is using dynamic-library level interception of specific functions, for example with LD_PRELOAD on linux or a wrapper dll on windows. Or for that matter, patching the binary.

Resources