From http://www.makelinux.net/ldd3/chp-7-sect-1.shtml
Needless to say, both jiffies and jiffies_64 must be considered
read-only
I wrote a program to verify and it successfully updates the jiffies value.
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/jiffies.h>
static int __init test_hello_init(void)
{
jiffies = 0;
pr_info("jiffies:%lu\n", jiffies);
return 0;
}
static void __exit test_hello_exit(void)
{
}
MODULE_LICENSE("GPL");
module_init(test_hello_init);
module_exit(test_hello_exit);
This module successfully sets the jiffies to zero. Am I missing something?
What you are reading is merely a warning. It is an unwritten contract between you (kernel module developer) and the kernel. You shouldn't modify the value of jiffies since it is not up to you to do so, and is updated by the kernel according to a set of complicated rules that you should not worry about. The jiffies value is used internally by the scheduler, so bad things can happen modifying it. Chances are that the variable you see in your module is only a thread-local copy of the real one, so modifying could have no effect. In any case, you shouldn't do it. It is only provided to you as additional information that your module might need to know to implement some logic.
Of course, since you are working in C, there is no concept of "permissions" for variables. Anything that is mapped in a readable and writable region of memory can be modified, you could even modify data in read-only memory by changing the permissions first. You can do all sorts of bad stuff if you want. There are a lot of things you're not supposed to alter, even if you have the ability to do so.
Related
This question already has answers here:
How can I prevent (not react to) a segmentation fault?
(3 answers)
Closed 2 years ago.
Can I tell if a pointer is in the rodata section of an executable?
As in, editing that pointer's data would cause a runtime system trap.
Example (using a C character pointer):
void foo(char const * const string) {
if ( in_rodata( string ) ) {
puts("It's in rodata!");
} else {
puts("That ain't in rodata");
}
}
Now I was thinking that, maybe, I could simply compare the pointer to the rodata section.
Something along the lines of:
if ( string > start_of_rodata && string < end_of_rodata ) {
// it's in rodata!
}
Is this a feasible plan/idea?
Does anyone have an idea as to how I could do this?
(Is there any system information that one might need in order to answer this?)
I am executing the program on a Linux platform.
I doubt that it could possibly be portable
If you don't want to mess with linker scripts or using platform-specific memory map query APIs, a proxy approach is fairly portable on platforms with memory protection, if you're willing to just know whether the location is writable, read-only, or neither. The general idea is to do a test read and a test write. If the first succeeds but the second one fails, it's likely .rodata or code segment. This doesn't tell you "it's rodata for sure" - it may be a code segment, or some other read-only page, such as as read-only file memory mapping that has copy-on-write disabled. But that depends on what you had in mind for this test - what was the ultimate purpose.
Another caveat is: For this to be even remotely safe, you must suspend all other threads in the process when you do this test, as there's a chance you may corrupt some state that code executing on another thread may happen to refer to. Doing this from inside a running process may have hard-to-debug corner cases that will stop lurking and show themselves during a customer demo. So, on platforms that support this, it's always preferable to spawn another process that will suspend the first process in its entirety (all threads), probe it, write the result to the process's address space (to some result variable), resume the process and terminate itself. On some platforms, it's not possible to modify a process's address space from outside, and instead you need to suspend the process mostly or completely, inject a probe thread, suspend the remaining other threads, let the probe do its job, write an answer to some agreed-upon variable, terminate, then resume everything else from the safety of an external process.
For simplicity's sake, the below will assume that it's all done from inside the process. Even though "fully capable" self-contained examples that work cross-process would not be very long, writing this stuff is a bit tedious especially if you want it short, elegant and at least mostly correct - I imagine a really full day's worth of work. So, instead, I'll do some rough sketches and let you fill in the blanks (ha).
Windows
Structured exceptions get thrown e.g. due to protection faults or divide by zero. To perform the test, attempt a read from the address in question. If that succeeds, you know it's at least a mapped page (otherwise it'll throw an exception you can catch). Then try writing there - if that fails, then it was read-only. The code is almost boring:
static const int foo;
static int bar;
#if _WIN32
typedef struct ThreadState ThreadState;
ThreadState *suspend_other_threads(void) { ... }
void resume_other_threads(ThreadState *) { ... }
int check_if_maybe_rodata(void *p) {
__try {
(void) *(volatile char *)p;
} __finally {
return false;
}
volatile LONG result = 0;
ThreadState *state = suspend_other_threads();
__try {
InterlockedExchange(&result, 1);
LONG saved = *(volatile LONG*)p;
InterlockedExchange((volatile LONG *)p, saved);
InterlockedExchange(&result, 0); // we succeeded writing there
} __finally {}
resume_other_threads(state);
return result;
}
int main() {
assert(check_if_maybe_rodata(&foo));
assert(!check_if_maybe_rodata(&bar));
}
#endif
Suspending the threads requires traversing the thread list, and suspending each thread that's not the current thread. The list of all suspended threads has to be created and saved, so that later the same list can be traversed to resume all the threads.
There are surely caveats, and WoW64 threads have their own API for suspension and resumption, but it's probably something that would, in controlled circumstances, work OK.
Unix
The idea is to leverage the kernel to check the pointer for us "at arms length" so that no signal is thrown. Handling POSIX signals that result from memory protection faults requires patching the code that caused the fault, inevitably forcing you to modify the protection status of the code's memory. Not so great. Instead, pass a pointer to a syscall you know should succeed in all normal circumstances to read from the pointed-to-address - e.g. open /dev/zero, and write to that file from a buffer pointed-to by the pointer. If that fails with EFAULT, it is due to buf [being] outside your accessible address space. If you can't even read from that address, it's not .rodata for sure.
Then do the converse: from an open /dev/zero, attempt a read to the address you are testing. If the read succeeds, then it wasn't read-only data. If the read fails with EFAULT that most likely means that the area in question was read-only since reading from it succeeded, but writing to it didn't.
In all cases, it'd be most preferable to use native platform APIs to test the mapping status of the page on which the address you try to access resides, or even better - to walk the sections list of the mapped executable (ELF on Linux, PE on Windows), and see exactly what went where. It's not somehow guaranteed that on all systems with memory protection the .rodata section or its equivalent will be mapped read only, thus the executable's image as-mapped into the running process is the ultimate authority. That still does not guarantee that the section is currently mapped read-only. An mprotect or a similar call could have changed it, or parts of it, to be writable, even modified them, and then perhaps changed them back to read-only. You'd then have to either checksum the section if the executable's format provides such data, or mmap the same binary somewhere else in memory and compare the sections.
But I smell a faint smell of an XY problem: what is it that you're actually trying to do? I mean, surely you don't just want to check if an address is in .rodata out of curiosity's sake. You must have some use for that information, and it is this application that would ultimately decide whether even doing this .rodata check should be on the radar. It may be, it may be not. Based on your question alone, it's a solid "who knows?"
I'm new to kernel modules and currently experimenting with it.
I've read that they have the same level access as the kernel itself.
Does this mean they have access to physical memory and can see/overwrite
values of other processes (including the kernel memory space)?
I have written this simple C code to overwrite every memory address but it's not doing anything (expecting the system to just crash, not sure if this is touching physical memory or it's still virtual memory)
I run it with sudo insmod ./test.ko, the code just hangs there (because of the infinite loop of course) but system works fine when I exit manually.
#include <linux/module.h>
#include <linux/kernel.h>
int init_module(void)
{
unsigned char *p = 0x0;
while (true){
*p=0;
p++;
}
return 0;
}
void cleanup_module(void)
{
//
}
Kernel modules run with kernel privileges (including kernel memory and all peripherals). The reason why your code isn´t working is, that you don´t specify the init and exit module. So you can load the module, but the kernel doesn´t call your methods.
Please take a look at this example for a minimal kernel module. Here you will find some explanation about the needed macros.
I want to create a loadable kernel module for Linux.
This is the code
#include <linux/module.h>
#include <linux/init.h>
static int __init mymodule_init(void)
{
printk ("My module worked!\n");
return 0;
}
static void __exit mymodule_exit(void)
{
printk ("Unloading my module.\n");
return;
}
module_init(mymodule_init);
module_exit(mymodule_exit);
MODULE_LICENSE("GPL");
Pay now attention to the __init macro. As the doc says:
The __init macro indicates to compiler that that associated function
is only used during initialization. Compiler places all code marked
with __init into a special memory section that is freed after
initialization
I'm trying to understand why the initialization method can end up leaking memory. Is it due to the FIFO disposition of function calls in the stack ?
In very broad strokes:
Executable code (what source code is compiled into) takes up memory. A modern CPU would read the section of memory where the instructions reside, and execute them. For most user space applications, the code segment of a processes memory is loaded once, and is never changed during program execution. The code is always there, unless programmers play around with it.
This isn't a problem, since the OS will manage the processes virtual memory and cold code segments will eventually be unloaded into a swap file. Physical memory is never "wasted" like that in user space.
For the kernel, where code runs in privileged mode, nothing will "unload" unused pages as happens in user mode. If a function is placed into the kernels regular code segment, it will take up physical memory for as long as the kernel runs, which can be quite a long time. If a function is only called once, that's quite a waste of space.
Now while loadable kernel modules can be loaded and unloaded in general, so their code may not take up space indefinitely, it's still somewhat wasteful to take up space for a function that is only going to be called once.
Since moderns CPU's treat code as a form of executable data, it's possible to place that data into a memory segment that is not retained indefinitely. The function is loaded, then called, and then the segment can be used for something else. This is what the __init macro instructs the compiler to do. To emit code which can be easily unloaded after being called.
Short version of question: What parameter do I need to pass to the clone system call on x86_64 Linux system if I want to allocate a new TLS area for the thread that I am creating.
Long version:
I am working on a research project and for something I am experimenting with I want to create threads using the clone system call instead of using pthread_create. However, I also want to be able to use thread local storage. I don't plan on creating many threads right now, so it would be fine for me to create a new TLS area for each thread that I create with the clone system call.
I was looking at the man page for clone and it has the following information about the flag for the TLS parameter:
CLONE_SETTLS (since Linux 2.5.32)
The newtls argument is the new TLS (Thread Local Storage) descriptor.
(See set_thread_area(2).)
So I looked at the man page for set_thread_area and noticed the following which looked promising:
When set_thread_area() is passed an entry_number of -1, it uses a
free TLS entry. If set_thread_area() finds a free TLS entry, the value of
u_info->entry_number is set upon return to show which entry was changed.
However, after experimenting with this some it appears that set_thread_area is not implemented in my system (Ubunut 10.04 on an x86_64 platform). When I run the following code I get an error that says: set_thread_area() failed: Function not implemented
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <linux/unistd.h>
#include <asm/ldt.h>
int main()
{
struct user_desc u_info;
u_info.entry_number = -1;
int rc = syscall(SYS_set_thread_area,&u_info);
if(rc < 0) {
perror("set_thread_area() failed");
exit(-1);
}
printf("entry_number is %d",u_info.entry_number);
}
I also saw that when I use strace the see what happens when pthread_create is called that I don't see any calls to set_thread_area. I have also been looking at the nptl pthread source code to try to understand what they do when creating threads. But I don't completely understand it yet and I think it is more complex than what I'm trying to do since I don't need something that is as robust at the pthread implementation. I'm assuming that the set_thread_area system call is for x86 and that there is a different mechanism used for x86_64. But for the moment I have not been able to figure out what it is so I'm hoping this question will help me get some ideas about what I need to look at.
I am working on a research project and for something I am experimenting with I want to create threads using the clone system call instead of using pthread_create
In the exceedingly unlikely scenario where your new thread never calls any libc functions (either directly, or by calling something else which calls libc; this also includes dynamic symbol resolution via PLT), then you can pass whatever TLS storage you desire as the the new_tls parameter to clone.
You should ignore all references to set_thread_area -- they only apply to 32-bit/ix86 case.
If you are planning to use libc in your newly-created thread, you should abandon your approach: libc expects TLS to be set up a certain way, and there is no way for you to arrange for such setup when you call clone directly. Your new thread will intermittently crash when libc discovers that you didn't set up TLS properly. Debugging such crashes is exceedingly difficult, and the only reliable solution is ... to use pthread_create.
The other answer is absolutely correct in that setting up a thread outside of libc's control is guaranteed to cause trouble at a certain point. You can do it, but you can no longer rely on libc's services, definitely not on any of the pthread_* functions or thread-local variables (defined as such using __thread or thread_local).
That being said, you can set one of the segment registers used for TLS (GS and FS) even on x86-64. The system call to look for is prctl(ARCH_SET_GS, ...).
You can see an example comparing setting up TLS registers on i386 and x86-64 in this piece of code.
I intend to develop a application that monitors the traffic on particular ports. For this I need to list all the sk_buff data of all the LIVE sk_buff's in the system. How to do this ?
I have written the following code (basically a kernel module.)
include <linux/module.h> /* Needed by all modules */
#include <linux/kernel.h> /* Needed for KERN_INFO */
#include </usr/src/linux-headers-2.6.38-8-generic/include/linux/skbuff.h>
int init_module(void)
{
struct sk_buff *skb;
printk(KERN_INFO "SKB 1.\n");
return 0;
}
void cleanup_module(void)
{
printk(KERN_INFO "Done 1.\n");
}
But I dont know how I catch the sk)buff's. I have simply declared a sk_buff instance .. thats all ..
Please help me to actually catch them live Sk_buff's in the system.
EDIT
I have tried all the top google search results. They give a very good description of the sk_buff itself, but none of them actually show how to do what I am particularly interested in.
There is no standardized way. Newly created skbs are not put into any list by default that you could read (that is, when they come fresh out of skb_alloc), therefore, there is no way to know all skbs are active from a random code point in the kernel, such as your module. You have at least two options though (both entail modifying core kernel code):
Since all skbuffs are allocated from a kmem_cache pool, you could augment the kmem_cache functionality by some function that tells you about all allocated objects.
Within the __alloc_skb function, add all newly allocated skbs into a data structure of your liking (and don't forget to remove them again when the skb is freed). This is going to be a major bottleneck, but that's what you have to pay.
As usual, the question: why?