I want to modify the virtual-to-physical address mapping (i.e. page table) of a particular process for every 60 seconds. I do know how to modify the page table, how to flush the cache and TLB accordingly, and how to leverage workqueue to invoke my function every 60 seconds. However, I am not sure how to "attach" to the target process from kernel or kernel module, assuming that the target process does not initiate any system call. By "attach", I mean the target process will behave similar as when it encountered an interrupt (e.g. a page fault) and the execution goes to the kernel immediately with the context being saved. So is it possible to do this?
Pointers to any similar usage in the kernel code or other hints are also appreciated.
Your best bet is to have the process you are interested in applying dynamic virtual address, call in to the module via IOCTL or any other call that ends up in the driver, sleep in process context using wait_* functions and then wake it up every 60 seconds to modify the page table in that thread. You should have assured context then.
Related
enter image description here
I learned that when a system call function is called, the process changes. But what is process B if I call the read function without the fork() function? isn't there is only one process?
On x86-64, there is one specific instruction to do system calls: syscall (https://www.felixcloutier.com/x86/syscall.html). When you call read() in C, it is compiled to placing the proper syscall number in a register along with the arguments you provide and to one syscall instruction. When syscall is executed, it jumps to the address stored in the IA32_LSTAR register. After that, it is in kernel mode executing the kernel's syscall handler.
At that point, it is still in the context of process A. Within its handler, the kernel realizes that you want to read from disk. It will thus start a DMA operation by writing some registers of the hard-disk controller. From there, process A is waiting for IO. There is no point in leaving the core idle so the kernel calls the scheduler and it will probably decide to switch the context of the core to another process B.
When the DMA IO operation is done, the hard-disk controller triggers an interrupt. The kernel thus puts process A back into the ready queue and calls the scheduler which will probably have the effect of switching the context of the core back to process A.
The image you provide isn't very clear so I can understand the confusion. Overall, on most architectures it will work similarly to what is stated above.
The image is somewhat misleading. What actually happens is, the read system call needs to wait for IO. There is nothing else that can be done in the context of process (or thread) A.
So kernel needs to find something else for the CPU to do. Usually there is some other process or processes which do have something to do (not waiting for a system call to return). It could also be another thread of process A that is given time to execute (from kernel point of view, thread and process aren't really much different, actually). There may be several processes which get to execute while process A waits for system call to complete, too.
And if there is nothing else for any other process and thread to do, then kernel will just be idle, let the CPU sleep for a bit, basically save power (especially important on a laptop).
So the image in the question shows just one possible situation.
I have a little paging problem on my realtime system, and wanted to know how exactly linux should behave in my particular case.
Among various other things, my application spawns 2 threads using pthread_create(), which operate on a set of shared buffers.
The first thread, let's call it A, reads data from a device, performs some calculations on it, and writes the results into one of the buffers.
Once that buffer is full, thread B will read all the results and send them to a PC via ethernet, while thread A writes into the next buffer.
I have noticed that each time thread A starts writing into a previously unused buffer, i miss some interrupts and lose data (there is an id in the header of each packet, and if that increments by more than one, i have missed interrupts).
So if i use n buffers, i get exactly n bursts of missed interrupts at the start of my data acquisition (therefore the problem is definitely caused by paging).
To fix this, i used mlock() and memset() on all of the buffers to make sure they are actually paged in.
This fixed my problem, but i was wondering where in my code would be the correct place do this. In my main application, or in one/both of the threads? (currently i do it in both threads)
According to the libc documentation (section 3.4.2 "Locked Memory Details"), memory locks are not inherited by child processes created using fork().
So what about pthreads? Do they behave the same way, or would they inherit those locks from my main process?
Some background information about my system, even though i don't think it matters in this particular case:
It is an embedded system powered by a SoC with a dual-core Cortex-A9 running Linux 4.1.22 with PREEMPT_RT.
The interrupt frequency is 4kHz
The thread priorities (as shown in htop) are -99 for the interrupt, -98 for thread A (both of which are higher than the standard priority of -51 for all other interrupts) and -2 for thread B
EDIT:
I have done some additional tests, calling my page locking function from different threads (and in main).
If i lock the pages in main(), and then try to lock them again in one of the threads, i would expect to see a large amount of page faults for main() but no page faults for the thread itself (because the pages should already be locked). However, htop tells a different story: i see a large amount of page faults (MINFLT column) for each and every thread that locks those pages.
To me, that would suggest that pthreads actually do have the same limitation as child processes spawned using fork(). And if this is the case, locking them in both threads (but not in main) would be the correct procedure.
Threads share the same memory management context. If a page is resident for one thread, it's resident for all threads in the same process.
The implication of this is that memory locking is per-process, not per-thread.
You are probably still seeing minor faults on the first write because a fault is used to mark the page dirty. You can avoid this by also writing to each page after locking.
Currently I am reading "Understanding the Linux kernel, 3rd edition" and on p.22 I can read:
In the simplest case, the CPU executes a kernel control path sequentially from the
first instruction to the last. When one of the following events occurs, however, the
CPU interleaves the kernel control paths:
A process executing in User Mode invokes a system call, and the corresponding
kernel control path verifies that the request cannot be satisfied immediately; it
then invokes the scheduler to select a new process to run. As a result, a process
switch occurs. The first kernel control path is left unfinished, and the CPU
resumes the execution of some other kernel control path. In this case, the two
control paths are executed on behalf of two different processes.
The kernel control path can be interrupted from a user space process doing a system call?
I thought the priority was pretty much:
interrupts
kernel threads
user space processes
I have checked the errata and could not find anything about this.
You are right about the priority list, but what (I think) the book is trying to say is:
When a (user) process makes a system call, the kernel starts executing on its behalf.
If the system call can be completed (the kernel control path does not run into a roadblock), then it will usually return direct to the calling process - think getpid() function call.
On the other hand, if the system call cannot be completed (for example, because the disk system must read a block into the kernel buffer pool before its data can be returned to the calling process), then the scheduler is used to select a new process to run - preempting the (kernel thread of control that was running on behalf of the) user process.
In due course, the original system call will be able to continue, and the original (kernel thread of control that was running on behalf of the) user process will be able to continue and eventually complete, returning control to the user space process running in user space and not in the kernel.
So "No": it is not the case that the 'kernel path can be interrupted from a user space process doing a system call'.
The kernel path can be interrupted while it is executing a system call on behalf of a user space process because: an interrupt occurs, or the kernel path must wait for a resource to become available, or ...
According to my question here I would like to use SCHED_RR with pthread_setschedparam for my threads in a Linux application. However, this has effects even on kernel modules which I currently cannot solve.
I have found http://www.icir.org/gregor/tools/pthread-scheduling.html which says that I could create my threads with PTHREAD_SCOPE_PROCESS attribute, but I haven't found further information on this.
Will this work with (Angstrom) Linux, kernel version2.6.32? (How) will this affect the way my process competes with other processes? Would it be the way to have my processes compete with real time scheduling but other processes would not be affected?
(As I am using boost threads I cannot simply try this...)
Threads created with PTHREAD_SCOPE_PROCESS will share the same kernel thread (
http://lists.freebsd.org/pipermail/freebsd-threads/2006-August/003674.html )
However, SCHED_RR must be run under a root-privileged process.
Round-Robin; threads whose contention scope is system
(PTHREAD_SCOPE_SYSTEM) are in real-time (RT) scheduling class if the
calling process has an effective user id of 0. These threads, if not
preempted by a higher priority thread, and if they do not yield or
block, will execute for a time period determined by the system.
SCHED_RR for threads that have a contention scope of process
(PTHREAD_SCOPE_PROCESS) or whose calling process does not have an
effective user id of 0 is based on the TS scheduling class.
However, basing on your linked problem I think you are facing a deeper issue. Have you tried setting your kernel to be more "preemptive"? Preemption should allow the kernel to forcibly schedule out of running your process allowing for more responsive running of some kernel parts. This shouldn't affect IRQs though, maybe something disabled your IRQs?
Another thing I am thinking about is maybe that you are not fetching your SPI data fast enough and the buffor for your data in the kernel becomes full and hence the data loss. Try increasing those buffers also.
I'm taking a course in operating systems and we work in Linux (Red hat 8.0). I'm trying to implement a file open,close tracker that will save for every process a history of files it opens and closes. I expected sys_open,close to also accept the process id and that I could use that to access the history of the process that initiated the call and update it (making the update part of the sysopen,close functions). However, these functions don't accept the pid as a parameter, so I'm a bit lost as to how to associate opening/closing files to the process that initiated that. My only guess is that since at any given time there's only one active process, that its meta-data must be global in some way, but I have no idea where or how to find it. Any advice would be appreciated.
Do you intend to do this in kernel space? Given that you were looking directly at sys_open etc, which sit in kernel space, IIRC, you can use the current pointer to see the current process's pid (current->pid).