I have some hardware that i want to emulate; i wonder if i can do it at a low level like this. The hardware has many registers, which i arrange in a struct:
#include <stdint.h>
struct MyControlStruct
{
uint32_t data_reg_1;
uint32_t data_reg_2;
uint32_t dummy[2]; // to make the following registers have certain addresses
uint32_t control_reg_1;
uint32_t control_reg_2;
};
volatile struct MyControlStruct* MyDevice = (struct MyControlStruct*)0xDeadF00;
So, i want to support the following syntax for hardware access on Windows and linux:
MyDevice->data_reg_1 = 42;
MyDevice->data_reg_2 = 100;
MyDevice->control_reg_1 = 1;
When the last line of code is executed, i want the hardware emulator to "wake up" and do some stuff. Can i implement this on Windows and/or linux? I thought about somehow catching the "segmentation fault" signal, but not sure whether this can be done on Windows, or at all.
I looked at the manual page of mmap; it seems like it can help, but i couldn't understand how i can use it.
Of course, i could abstract the access to hardware by defining functions like WriteToMyDevice, and everything would be easy (maybe), but i want to understand if i can arrange access to my hardware in this exact way.
In principle, you could code (unportably) a handler for SIGSEGV which would trap and handle access to unwanted pages, and which could check that a specified address is accessed.
To do that under Linux, you'll need to use the sigaction system call with SA_SIGINFO and use the ucontext_t* third argument of your signal handler.
This is extremely unportable: you'll have to code differently for different Unixes (perhaps even the version number of your Linux kernel could matter) and when changing processors.
And I've heard that Linux kernels are not very quick on such handling.
Other better kernels (Hurd, Plan9) offer user-level pagination, which should help.
I initially misunderstand your question. You have a piece of memory mapped hardware and you want your emulation to be binary compatible. On Windows you could allocate the memory for the structure using VirtualAlloc and make it a guard page and catch any access to it using SEH.
In actuality your emulator is (rather crudely) possible on linux with pure user space code.
To build the emulator, simply have a second thread or process (using shared memory, or perhaps an mmap'd file and inotify) watching the memory which is emulating the memory mapped device
For the real hardware driver, you will need a tiny bit of kernel code, but that could simply be something that maps the actual hardware addresses into user space with appropriate permissions. In effect this regresses a modern multiuser operating environment down to acting like an old dos box or a simple micro-controller - not great practice, but workable at least where security is not a concern.
Another thing you could consider would be running the code in a virtual machine.
If the code you will be exercising is your own, it's probably better to write it in a portable manner to begin with, abstracting out the hardware access into functions that you can re-write for each platform (ie, OS, hardware version or physical/emulated). These techniques are more useful if it's someone else's existing code you need to create an environment for. Another thing you can consider (if the original isn't too tightly integrated) is using dynamic-library level interception of specific functions, for example with LD_PRELOAD on linux or a wrapper dll on windows. Or for that matter, patching the binary.
Related
I know that a process switches between user mode and kernel mode for running. I am confused that for every line of code, we should possibly need the kernel. Below is the example, could I get explanation of the kernels role in execution of the following coding lines. Does the following actually require kernel mode.
if(a < 0)
a++
I am confused that for every line of code, we should possibly need the kernel.
Most code in user-space is executed without the kernel being involved. The kernel becomes involved (and the CPU switches from user-space to kernel) when:
a) The user-space code explicitly asks the kernel to do something (calls a system call).
b) There's an IRQ (from a device) that interrupts user-space code.
c) The kernel is providing some functionality that user-space code is unaware of. The most common reason is virtual memory management; but debugging and profiling are other reasons.
d) Asynchronous notifications (e.g. something causing a switch to kernel so that kernel can redirect the program to a suitable signal handler).
e) The user-space code does something illegal (crashes).
Does the following actually require kernel mode.
That code (if(a < 0) a++;) probably won't require kernel's assistance; but it possibly might. For example, if the variable a is in memory that was previously sent to swap space, then any attempt to access a is a request for the kernel to fetch that data from swap space. In a similar way, if the executable file was memory mapped but not loaded yet (a common optimization to improve program startup time), then attempting to execute any instruction (regardless of what the instruction is) could ask the kernel to fetch the code from the executable file on disk.
Short answer:
It depends on what you are trying to do, following code depending on which enviroment and how its compiled it shouldn't need to use the kernel. The CPU executes machine code directly, only trapping to the kernel on instructions like syscall, or on faults like page-fault or an interrupt.
The ISA is designed so that a kernel can set up the page tables in a way that stops user-space from taking over the machine, even though the CPU is fetching bytes of its machine code directly. This is how user-space code can run just as efficiently when it's just operating on its own data, doing pure computation not hardware access.
Long answer:
Comparing something and increasing value of something shouldn't require use of a kernel, On x86 (64 bit) architecture following could be represented like this (in NASM syntax):
; a is in RAX, perhaps a return value from some earlier function
cmp rax, 0 ; if (a<0) implemented as
jnl no_increase ; a jump over the inc if a is Not Less-than 0
inc rax
no_increase:
Actual compilers do it branchlessly, with various tricks as you can see on the Godbolt compiler explorer.
Clearly there aren't any syscalls so this piece of code can be ran on any x86 device but it wouldn't be meaningful
What requires kernels are the system calls now sys calls aren't required to have a device that can output something in theory you can output something by finding a memory location that let's say corresponds to video memory and you can manipulate pixels to output something in the screen but for userland this isn't possible due virtual memory.
A userspace application needs a kernel to exist if a kernel did not exist then userspace wouldn't exist :) and please note not every kernel let's a userspace.
So only doing something like:
write(open(stdout, _O_RDWR), "windows sucks linux rocks", 24);
would obviously require a kernel.
Writing / reading to arbitary memory location for example: 0xB8000 to manipulate video memory doesn't need a kernel.
TL:DR; For example code you provided it needs a kernel to be in userspace but can be written in a system where userspace and kernel doesn't exist at all and work perfectly fine (eg: microcontrollers)
In simpler words: It doesn't require a kernel to be work since it doesn't use any system calls, but for meaningful operation in a modern operating system it would atleast require a exit syscall to exit with a code otherwise you will see Segmentation fault even though there isn't dynamic allocation done by you.
Whatever the code we write is but obvious in the realm of user mode.. Kernel mode is only going to be in picture when you write any code that performs any system call..
and since the if() is not calling any system function it's not going to be in kernel mode.
I am trying to create a mechanism to read performance counters for processes. I want this mechanism to be executed from within the kernel (version 4.19.2) itself.
I am able to do it from the user space the sys_perf_event_open() system call as follows.
syscall (__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
I would like to invoke this call from the kernel space. I got some basic idea from here How do I use a Linux System call from a Linux Kernel Module
Here are the steps I took to achieve this:
To make sure that the virtual address of the kernel remains valid, I have used set_fs(), get_fs() and get_fd().
Since sys_perf_event_open() is defined in /include/linux/syscalls.h I have included that in the code.
Eventually, the code for calling the systems call looks something like this:
mm_segment_t fs;
fs = get_fs();
set_fs(get_ds());
long ret = sys_perf_event_open(&pe, pid, cpu, group_fd, flags);
set_fs(fs);
Even after these measures, I get an error claiming "implicit declaration of function ‘sys_perf_event_open’ ". Why is this popping up when the header file defining it is included already? Does it have to something with the way one should call system calls from within the kernel code?
In general (not specific to Linux) the work done for systems calls can be split into 3 categories:
switching from user context to kernel context (and back again on the return path). This includes things like changing the processor's privilege level, messing with gs, fiddling with stacks, and doing security mitigations (e.g. for Meltdown). These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
using a "function number" parameter to find the right function to call, and calling it. This typically includes some sanity checks (does the function exist?) and a table lookup, plus code to mangle input and output parameters that's needed because the calling conventions used for system calls (in user space) is not the same as the calling convention that normal C functions use. These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
the final normal C function that ends up being called. This is the function that you might have (see note) been able to call directly without using any of the expensive, useless and/or dangerous system call junk.
Note: If you aren't able to call the final normal C function directly without using (any part of) the system call junk (e.g. if the final normal C function isn't exposed to other kernel code); then you must determine why. For example, maybe it's not exposed because it alters user-space state, and calling it from kernel will corrupt user-space state, so it's not exposed/exported to other kernel code so that nobody accidentally breaks everything. For another example, maybe there's no reason why it's not exposed to other kernel code and you can just modify its source code so that it is exposed/exported.
Calling system calls from inside the kernel using the sys_* interface is discouraged for the reasons that others have already mentioned. In the particular case of x86_64 (which I guess it is your architecture) and starting from kernel versions v4.17 it is now a hard requirement not to use such interface (but for a few exceptions). It was possible to invoke system calls directly prior to this version but now the error you are seeing pops up (that's why there are plenty of tutorials on the web using sys_*). The proposed alternative in the Linux documentation is to define a wrapper between the syscall and the actual syscall's code that can be called within the kernel as any other function:
int perf_event_open_wrapper(...) {
// actual perf_event_open() code
}
SYSCALL_DEFINE5(perf_event_open, ...) {
return perf_event_open_wrapper(...);
}
source: https://www.kernel.org/doc/html/v4.19/process/adding-syscalls.html#do-not-call-system-calls-in-the-kernel
Which kernel version are we talking about?
Anyhow, you could either get the address of the sys_call_table by looking at the System map file, or if it is exported, you can look up the symbol (Have a look at kallsyms.h), once you have the address to the syscall table, you may treat it as a void pointer array (void **), and find your desired functions indexed. i.e sys_call_table[__NR_open] would be open's address, so you could store it in a void pointer and then call it.
Edit: What are you trying to do, and why can't you do it without calling syscalls? You must understand that syscalls are the kernel's API to the userland, and should not be really used from inside the kernel, thus such practice should be avoided.
calling system calls from kernel code
(I am mostly answering to that title; to summarize: it is forbidden to even think of that)
I don't understand your actual problem (I feel you need to explain it more in your question which is unclear and lacks a lot of useful motivation and context). But a general advice -following the Unix philosophy- is to minimize the size and vulnerability area of your kernel or kernel module code, and to deport, as much as convenient, such code in user-land, in particular with the help of systemd, as soon as your kernel code requires some system calls. Your question is by itself a violation of most Unix and Linux cultural norms.
Have you considered to use efficient kernel to user-land communication, in particular netlink(7) with socket(7). Perhaps you also
want some driver specific kernel thread.
My intuition would be that (in some user-land daemon started from systemd early at boot time) AF_NETLINK with socket(2) is exactly fit for your (unexplained) needs. And eventd(2) might also be relevant.
But just thinking of using system calls from inside the kernel triggers a huge flashing red light in my brain and I tend to believe it is a symptom of a major misunderstanding of operating system kernels in general. Please take time to read Operating Systems: Three Easy Pieces to understand OS philosophy.
I'd like to have a library that allows 'self profiling' of critical sections of Linux executables. In the same way that one can time a section using gettimeofday() or RDTSC I'd like to be able to count events such as branch misses and cache hits.
There are a number of tools that do similar things (perf, PAPI, likwid) but I haven't found anything that matches what I'm looking for. Likwid comes closest, so I'm mostly looking at ways to modify it's existing Marker API.
The per-core counters are values are stored in MSR's (Model Specific Registers), but for current Intel processors (Sandy Bridge onward) the "uncore" measurements (memory accesses and other things that pertain to the CPU as a whole) are accessed with PCI.
The usual approach taken is that the MSR's are read using the msr kernel module, and that the PCI counters (if supported) are read from the sysfs-pci hierarchy. The problem is that both or these require the reader to be running as root and have 'setcap cap_sys_rawio'. This is difficult (or impossible) for many users.
It's also not particularly fast. Since the goal is to profile small pieces of code, the 'skew' from reading each counter with a syscall is significant. It turns out that the MSR registers can be read by a normal user using RDPMC. I don't yet have a great solution for reading the PCI registers.
One way would be to proxy everything through an 'access server' running as root. This would work, but would be even slower (and hence less accurate) than using /proc/bus/pci. I'm trying to figure out how best to make the PCI 'configuration' space of the counters visible to a non-privileged program.
The best I've come up with is to have a server running as root, to which the client can connect at startup via a Unix local domain socket. As root, the server will open the appropriate device files, and pass the open file handle to the client. The client should then be able to make multiple reads during execution on its own. Is there any reason this wouldn't work?
But even if I do that, I'll still be using a pread() system call (or something comparable) for every access, of which there might be billions. If trying to time small sub-1000 cycle sections, this might be too much overhead. Instead, I'd like to figure out how to access these counters as Memory Mapped I/O.
That is, I'd like to have read-only access to each counter represented by an address in memory, with the I/O mapping happening at the level of the processor and IOMMU rather than involving the OS. This is described in the Intel Architectures Software Developer Vol 1 in section 16.3.1 Memory Mapped I/O.
This seems almost possible. In proc_bus_pci_mmap() the device handler for /proc/bus/pci seems to allow the configuration area to be mapped, but only by root, and only if I have CAP_SYS_RAWIO.
static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
{
struct pci_dev *dev = PDE_DATA(file_inode(file));
struct pci_filp_private *fpriv = file->private_data;
int i, ret;
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
/* Make sure the caller is mapping a real resource for this device */
for (i = 0; i < PCI_ROM_RESOURCE; i++) {
if (pci_mmap_fits(dev, i, vma, PCI_MMAP_PROCFS))
break;
}
if (i >= PCI_ROM_RESOURCE)
return -ENODEV;
ret = pci_mmap_page_range(dev, vma,
fpriv->mmap_state,
fpriv->write_combine);
if (ret < 0)
return ret;
return 0;
}
So while I could pass the file handle to the client, it can't mmap() it, and I can't think of any way to share an mmap'd region with a non-descendent process.
(Finally, we get to the questions!)
So presuming I really want have a pointer in a non-privileged process that can read from PCI configuration space without help from the kernel each time, what are my options?
1) Maybe I could have a root process open /dev/mem, and then pass that open file descriptor to the child, which then can then mmap the part that it wants. But I can't think of any way to make that even remotely secure.
2) I could write my own kernel module, which looks a lot like linux/drivers/pci/proc.c but omits the check for the usual permissions. Since I can lock this down so that it is read-only and just for the PCI space that I want, it should be reasonably safe.
3) ??? (This is where you come in)
maybe the answer is a little late. The answer is using likwid.
As you said read MSR/sysfs-pci has to be done by root. Building likwid accessDaemon and giving it the right to access the MSR would bypass this issue. Of course, due to some inter-process communication, performance values could have some delay. This delay is not very high.
(For small code sections, the performance counters are unprecise in some how, in any way.)
Likwid can also with uncore events.
Best
I'm currently in the process of writing a state machine in C for a microcontroller (a TI MSP430). Now, I don't have any problems with writing the code and implementing my design, but I am wondering how to prove the state machine logic without having to use the actual hardware (which, of course, isn't yet available).
Using debugging features, I can simulate interrupts (although I haven't yet tried to do this, I'm just assuming it will be okay - it's documented after all) and I have defined and reserved a specific area of memory for holding TEST data, which using debugging macros, I can access at runtime outside of the application in a Python script. In other words, I have some test foundations in place. However, the focus of my question is this:
"How best do I force a certain state machine flow for decisions that require hardware input, e.g., for when an input pin is high or low". For example, "if some pin is high, follow this path, otherwise follow this path".
Again, using debugging macros, I can write to registers outside of the application (for example, to light an LED), but I can't (understandably) write to the read-only registers used for input, and so forcing a state machine flow in the way described above is proving taxing.
I had thought of using #ifdefs, where if I wanted to test flow I could use an output pin and check this value instead of the input pin that would ultimately be used. However, this will no doubt pepper my codebase with test-only code, which feels like the wrong approach to take. Does anyone have any advice on a good way of achieving this level of testing? I'm aware that I could probably just use a simulator, but I want to use real hardware wherever possible (albeit an evaluation board at this stage).
Sounds like you need abstraction.
Instead of, in the "application" code (the state machine) hard-coding input reading using e.g. GPIO register reads, encapsulate those reads into functions that do the check and return the value. Inside the function, you can put #ifdef:ed code that reads from your TEST memory area instead, and thus simulates a response from the GPIO pin that isn't there.
This should really be possible even if you're aiming for high performance, it's not a lot of overhead and if you work at it, you should be able to inline the functions.
Even though you don't have all the hardware yet, you can simulate pretty much everything.
A possible way of doing it in C...
Interrupt handlers = threads waiting on events.
Input devices = threads firing the above events. They can be "connected" to the PC keyboard, so you initiate "interrupts" manually. Or they can have their own state machines to do whatever necessary in an automated manner (you can script those too, they don't have to be hardwired to a fixed behavior!).
Output devices = likewise threads. They can be "connected" to the PC display, so you can see the "LED" states. You can log outputs to files as well.
I/O pins/ports can be just dedicated global variables. If you need to wake up I/O device threads upon reading/writing from/to them, you can do so too. Either wrap accesses to them into appropriate synchronization-and-communication code or even map the underlying memory in such a way that any access to these port variables would trigger a signal/page fault whose handler would do all the necessary synchronization and communication for you.
And the main part is in, well, main(). :)
This will create an environment very close to the real. You can even get race conditions!
If you want to be even more hardcode about it and if you have time, you can simulate the entire MSP430 as well. The instruction set is very compact and simple. Some simulators exist today, so you have some reference code to leverage.
If you want to test your code well, you will need to make it flexible enough for the purpose. This may include adding #ifdefs, macros, explicit parameters in functions instead of accessing global variables, pointers to data and functions, which you can override while testing, all kinds of test hooks.
You should also think of splitting the code into hardware-specific parts, very hardware-specific parts and plain business logic parts, which you can compile into separate libraries. If you do so, you'll be able to substitute the real hardware libs with test libs simulating the hardware.
Anyhow, you should abstract away the hardware devices and use test state machines to test production code and its state machines.
Build a test bench. First off I recommend when for example you read the input registers or whatever, use some sort of function call (vs some volatile this that the other address thing). Basically everything has at least one layer of abstraction. Now your main application can easily be lifted and placed anywhere with test functions for each of the abstractions. You can completely test that code without any of the real hardware. Also once on the real hardware you can use the abstraction (wrapper function, whatever you want to call it) as a way to change or fake the input.
switch(state)
{
case X:
r=read_gpio_port();
if(r&0x10) next_state = Y;
break;
}
In a test bench (or even on hardware):
unsigned int test_count;
unsigned read_gpio_port ( void )
{
test_count++;
return(test_count);
}
Eventually implement read_gpio_port in asm or C to access the gpio port, and link that in with the main application instead of the test code.
yes, you suffer a function call unless you inline, but in return your debugging and testing abilities are significantly greater.
How can I emulate a memory I/O device for unit testing on Linux?
I'm writing a unit test for some source code for embedded deployment.
The code is accessing a specific address space to communicate with a chip.
I would like to unit test(UT) this code on Linux.
The unit test must be able to run without human intervention.
I need to run the UT as a normal user.
The code must being tested must be exactly the source code being run on the target system.
Any ideas of where I could go for inspiration on how to solve this?
Can an ordinary user somehow tell the MMU that a particular memory allocation must be done at a specific address.
Or that a data block must be in a particular memory areas?
As I understand it:
sigsegv can't be used; since after the return from the handler the same mem access code will be called again and fail again. ( or by accident the memory area might actually have valid data in it, just not what I would like)
Thanks
Henry
First, make the address to be read an injected dependency of the code, instead of a hard-coded dependency. Now you don't have to worry about the location under test conditions, it can be anything you like.
Then, you may also need to inject a function to read/write from/to the magic address as a dependency, depending what you're testing. Now you don't have to worry about how it's going to trick the code being tested into thinking it's performing I/O. You can stub/mock/whatever the hardware I/O behavior.
It's quite difficult to test low-level code under the conditions you describe, whilst also keeping it super-efficient in non-test mode, because you don't want to introduce too many levels of indirection.
"Exactly the source code" can hide a multitude of sins, though, depending how you interpret it. For example, your "dependency injection" could be via a macro, so that the unit source is "the same", but you've completely changed what it does with a sneaky -D compiler option.
AFAIK you need to create a block device (I am not sure whether character device will work). Create a kernel module that maps that memory range to itself.
create read/write function, so whenever that memory range is touched, those read/write functions are called.
register those read/write function with the kernel, so that whenever there is read/write to those addresses, kernel is invoked and read/write functionality is performed by kernel on behalf of user.