I'm getting a segfault when running this code as root in userspace. I don't understand why. I believe I have a rootkit and I want to check if the addresses are the same as the ones as in /boot/System.map-3.2.0-4-amd64
unsigned long hex;
unsigned long **sys_call_table;
for(hex = 0xffffffff810f8989; hex < 0xffffffff8160e370; hex += sizeof(void *))
{
sys_call_table = (unsigned long **)hex;
if(sys_call_table[3] == (unsigned long *)0xffffffff810f8989)
{
puts("sys_close's address has not been replaced by the rootkit");
}
}
cat /boot/System.map-3.2.0-4-amd64 | grep "string you want"
ffffffff81401200 R sys_call_table
ffffffff810f9f9e T sys_read // sys_call_table[0]
ffffffff810fa009 T sys_write // sys_call_table[1]
ffffffff810f950d T sys_open // sys_call_table[2]
ffffffff810f8989 T sys_close // sys_call_table[3]
ffffffff8160e370 D loops_per_jiffy
Running from root is not enough - the problem is that you run it in user space - run it in the kernel space, as a kernel module, for example. Although having root privileges is enough for invoking system calls you cannot access the table - in user space you can only access allocated memory to you.
Related
I'm writing my first trivial device driver and got a few questions:
I'm following this book but doesn't seem like it goes into the details of the working while copy_(to|from)_user() API (or any APIs that transfer data between the user and kernel space) is executed. Something not super detailed but something one must know while working on kernel.
What's the implementation of copy_from_user() really like? I came across the following snippets but it just goes down to the assembly level. I might be navigating incorrectly. I have seen some references for this function and looks like if it returns anything other than 0, something went wrong.
// https://elixir.bootlin.com/linux/latest/source/include/linux/uaccess.h#L189
static __always_inline unsigned long __must_check
copy_from_user(void *to, const void __user *from, unsigned long n)
{
if (likely(check_copy_size(to, n, false)))
n = _copy_from_user(to, from, n);
return n;
}
__copy_from_user(void *to, const void __user *from, unsigned long n)
{
might_fault();
if (should_fail_usercopy())
return n;
instrument_copy_from_user(to, from, n);
check_object_size(to, n, false);
return raw_copy_from_user(to, from, n);
}
// https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/uaccess.h#L385
#define raw_copy_from_user(to, from, n) \
({ \
unsigned long __acfu_ret; \
uaccess_enable_not_uao(); \
__acfu_ret = __arch_copy_from_user((to), \
__uaccess_mask_ptr(from), (n)); \
uaccess_disable_not_uao(); \
__acfu_ret; \
})
// https://elixir.bootlin.com/linux/latest/source/arch/nds32/lib/copy_from_user.S#L34
.text
ENTRY(__arch_copy_from_user)
add $r5, $r0, $r2
#include "copy_template.S"
move $r0, $r2
ret
.section .fixup,"ax"
.align 2
9001:
sub $r0, $r5, $r0
ret
.previous
ENDPROC(__arch_copy_from_user)
During a syscall the kernel still has the process memory space mapped, so can directly read and write on most modern architectures. The main work is validating the user-provided address and size. Also, the data may not be resident so the normal page fault mechanism can be triggered. After that its just a memcpy.
Most of the macro layers and calls are there to deal with arch-specific differences. For example, ARM has user-access override, uao in your example code, which involves privileged mode access to user memory.
EDIT:
During the syscall, the current process isn't changed so the kernel has both the kernel memory and the user process memory in the memory map.
Address validation is to limit the access to the allowed user-process memory. Otherwise, the user process could pass a kernel address to a write, for example, and copy kernel memory out to a user file.
The code inside main.c
#include <stdio.h>
#include <unistd.h>
int main() {
int c_variable = 0; // the target
for(int x = 0; x < 100; x++) {
c_variable += 5; // increase by 5 to change the value of the int
printf("%i\n", c_variable); // print current value
sleep(8); // sleep so I have time to scan memory
}
return 0;
}
What I am trying to achieve is to read the integer c_variable and then to modify it inside another .c program. I am on linux so I did ps -A | grep main and got the PID of the running program. I then did sudo scanmem PID and entered the current value of c_variable a few times. I was left with three memory addresses and executing the command set 500 changed the value the program printed, effectively changing the memory address' value to 500 instead of 35 or whatever the program was currently at. I then executed the following code
#include <stdio.h>
int main() {
const long unsigned addr = 0x772d85fa1008; // one of the three addresses from scanmem
printf("%lu\n", addr);
return 0;
}
but I got some random long string of numbers, not the current number. The tutorials and answers I have read on how to read and write memory on linux does not have to use long unsigned but can use char* or just int* instead. My memory address seems to be a bit long, I have not see memory addresses that long before. Anyhow, how do I read and write the memory address of the integer c_variable?
Edit: the output of scanmem looks something like this
info: we currently have 3 matches.
3> list
[ 0] 7771ff64b090, 6 + 1e090, stack, 20, [I64 I32 I16 I8 ]
[ 1] 7771ff64b5d8, 6 + 1e5d8, stack, 20, [I64 I32 I16 I8 ]
[ 2] 7771ff64b698, 6 + 1e698, stack, 20, [I32 I16 I8 ]
3> set 50
info: setting *0x7771ff64b090 to 0x32...
info: setting *0x7771ff64b5d8 to 0x32...
info: setting *0x7771ff64b698 to 0x32...
output
...
150
155
160
165
170
175
55
60
65
...
You're printing the actual address number, but in in decimal notation, not what is at the address.
const int *addr = (int *) 0x772d85fa1008;
printf("%d\n", *addr);
You have to declare addr as a pointer type. More specifically a pointer to an integer. Its value (0x772d85fa1008) holds the address of the integer.
Then, in the printf call you dereference it to obtain the actual integer stored at the address.
Although in practice I can't vouch for whether this is going to work, since memory in modern operating systems isn't as simple as you make it out to be. But I don't have enough knowledge to assess that.
Processes running under Linux generally have their own virtualized memory space. If you want to access memory space of another process, arrangements have been made in the Linux API, see shmctl, shmget, shmat, shmdt.
I am trying to find the resident set size of a c program running on Linux os (ubuntu 14.04). I get the PID of the running C program and pass it to a custom kernel module. The kernel module figures out the *task and extracts the *mm pointer. Then I loop through all the VM areas and in each VM area I again loop through each page aligned virtual addresses and request a page_walk(virtual addresses) to get the pte structure of type pte_t. Then I used the pte_preset() function to check the existence of the actual physical page in the RAM.
The issues I am facing are as follows:
The rss value does not match with the value shown in htop or top. Although the value I have calculated does increase proportionally as the test C program accesses more memory (using some array accessing).
I have found that the rss value of htop application gives the same result as given by the get_mm_struct() function call provided by the Linux kernel itself.
static inline unsigned long get_mm_rss(struct mm_struct *mm)
{
return get_mm_counter(mm, MM_FILEPAGES) +
get_mm_counter(mm, MM_ANONPAGES) +
get_mm_counter(mm, MM_SHMEMPAGES);
}
My query is how to count or detect these anonymous pages and shared pages? What are bits that need to be checked?
Thank You !
The correct way to do this is to realize that the count is in an array. Try:
static inline unsigned long get_mm_rss(struct mm_struct *mm)
{
int k;
unsigned long count = 0;
for(k = 0; k < NR_MM_COUNTERS; k++) {
long len = atomic_long_read(&mm->rss_stat.count[k]);
if(len < 0)
len = 0;
count += len;
}
}
Walking Physical Pages
You need to set up mm_walk struct with your call backs for pte and pmd (driven by whether or not HUGETABLES are used in the kernel) to walk through the physical pages.
For example:
show_smap uses this:
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
#ifdef CONFIG_HUGETLB_PAGE
.hugetlb_entry = smaps_hugetlb_range,
#endif
.mm = vma->vm_mm,
};
after setting up the call backs.
I am developing a PCIE device driver for openwrt, and I met a data bus error when trying to access the io-memory in timer interrupt, which I mentioned in my last question. After lots of research I think I might have found the reason, but I am unable to solve it. Below are my troubles.
Last week I found out that the pcie region size might have changed during system startup. The region size of bar0 is 4096 in my driver (return from pci_resource_len) and the region size is 4097 in lspci -vv, which breaks the page size of linux kernel. By reading the source code of pciutil, I find that lspci command fetch the pcie information from /sys/devices/pci0000:00/0000:00:00.0/resouce file. So I remove all my custom components and run the original openwrt on my router. By cat /sys/devices/pci0000:00/0000:00:00.0/resouce, the first line of the result (bar0) is
0x0000000010008000 0x0000000010009000 0x0000000000040200
Moreover, I also check the content of /proc/iomem, and the content related to PCIE is
10000000-13ffffff : mem_base
10000000-13ffffff : PCI memory space
10000000-10007fff : 0000:00:00.0
10008000-10008fff : 0000:00:00.0
It is super weird that the region size of bar0 indicated by the two files above is different! According to the mechanism of PCIE, the region size should always be the power of 2. How come the region size becomes 4097?
After spending weeks reading source code of linux kernel, I find out that this is a bug of linux kernel 4.4.14.
The content of /sys/devices/pci0000:00/0000:00:00.0/resouce is generated through function resource_show in file drivers/pci/pci-sysfs.c. The related code is
for (i = 0; i < max; i++) {
struct resource *res = &pci_dev->resource[i];
pci_resource_to_user(pci_dev, i, res, &start, &end);
str += sprintf(str, "0x%016llx 0x%016llx 0x%016llx\n",
(unsigned long long)start,
(unsigned long long)end,
(unsigned long long)res->flags);
}
The function pci_resource_to_user actually invoked is located in arch/mips/include/asm/pci.h
static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
const struct resource *rsrc, resource_size_t *start,
resource_size_t *end)
{
phys_addr_t size = resource_size(rsrc);
*start = fixup_bigphys_addr(rsrc->start, size);
*end = rsrc->start + size;
}
The calculation of *end is wrong and should be replace by
*end = rsrc->start + size - (size ? 1 : 0)
I understand that the program break is the highest virtual memory address that the Linux OS has allocated for a process, and therefore marks the highest address of the heap. You can get the address of the program break by calling sbrk( 0 ).
When I create the following trivial program, I get different results each time it's run:
#define _BSD_SOURCE
#include <stdio.h>
#include <unistd.h>
int main()
{
printf( "system break: %p\n", sbrk( 0 ) );
return 0;
}
For example, on my PC:
$ ./sbrk
system break: 0x81fc000
$ ./sbrk
system break: 0x9bce000
$ ./sbrk
system break: 0x97a6000
My understanding was that the heap is allocated immediately above the BSS section in virtual memory - I guess I was expecting that it would always have the same initial value for a trivial program like this. Is there some randomization or something in where the program break is initially positioned? If not, why is it different each time I run the program?
By default the kernel will randomise the initial point, though this feature can be disabled. This is the code that is run (for x86, in arch/x86/kernel/process.c):
unsigned long arch_randomize_brk(struct mm_struct *mm)
{
unsigned long range_end = mm->brk + 0x02000000;
return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
}
Additionally, in this function from the ELF binary loader (fs/binfmt_elf.c), you can see the function called:
if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
current->mm->brk = current->mm->start_brk =
arch_randomize_brk(current->mm);
#ifdef CONFIG_COMPAT_BRK
current->brk_randomized = 1;
#endif
}
Yes there is randomistion. Known as Address Space Layout Randomisation (ASLR). http://en.wikipedia.org/wiki/Address_space_layout_randomization