Dumping the pfn from /proc/<pid>/pagemap does not give the expected content - c

I'm using this code http://fivelinesofcode.blogspot.com/2014/03/how-to-translate-virtual-to-physical.html to dump the pfn related to a given virtual address taken from /proc/"pid"/maps.
Once I get the PFN, I dump it with a specific kernel module. This is a snippet of the code:
static int write_pfn(phys_addr_t pfn)
{
struct page *p;
void *v;
int s =0,ret =0;
p = pfn_to_page((pfn) >> PAGE_SHIFT);
v = kmap(p);
DBG("Writing page %d(mapped addr=0x%lx) - pfn: 0x%lx", p,v,pfn);
s = write_vaddr(v, PAGE_SIZE);
if (s != PAGE_SIZE) {
DBG("Error sending page %d(addr=0x%lx)", s,v);
return (int) s;
ret-=1;
}
kunmap(p);
return ret;
}
However, I have noticed that if I compare the PFN dumped with the kernel module with the real content of the corresponding virtual address inside the process , than the content is completely different.
Note that I dump the content of the process virtual address using the command "x" from (gdb).
Any idea? This is my kernel version:
Linux 3.14.7-rt5 #1 SMP Mon Jun 23 14:55:19 CEST 2014 x86_64 GNU/Linux

Logically your solution is correct, but i'am not sure about correctness of the pagemap reader code. Anyway, there is a way to check it. While you are inside kernel (in the context of target application) you could get page corresponding to specified VA using this kernel API:
1. mm = current->mm
2. struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
3. static inline struct page *follow_page(struct vm_area_struct *vma,
unsigned long address, unsigned int foll_flags)
4. page_to_pfn
You can implement this and compare with your results, Hope this will help.

Thank you Alex.
The problem with your solution is that some symbols are not exported (e.g. follow_page()) therefore I can't use them in my module.
However, I found this solution:
mm = ts->mm;
pgd_t * pgd = pgd_offset(mm, vaddr);
pud_t * pud = pud_offset(pgd, vaddr);
pmd_t * pmd = pmd_offset(pud, vaddr);
pte_t * pte = pte_offset_map(pmd, vaddr);
p = pte_page(*pte);
if(p)
DBG("page frame struct is # %p", p);
v = kmap(p);
DBG("Writing page 0x%lx(mapped addr=0x%lx) - pid: %d", v,vaddr,pidnr);
s = write_vaddr(v, PAGE_SIZE);
if (s != PAGE_SIZE) {
DBG("Error sending page %d(addr=0x%lx pid=%d)", v,vaddr,pidnr);
ret-=1;
}
kunmap(p);
pte_unmap(pte);

Related

Correct procedure and memory addresses to setup a virtio-net ethernet device on a sel4 microkernel

In short:
I am trying to run the sel4 microkernel inside a x86_64 virtual machine and can't get the ethernet interface working.
What is the correct procedure to get internet connectivity (via a vitio-net ethernet device) on a sel4 microkernel? And what are the correct (memory) addresses?
Long version:
I have tried the camkes (picoserver) examples with the e1000 netdevice but couldn't get them to work so I decided to learn some new things and start from scratch. Also I decided to use virtio-net(together with vhost) instead of an emulated e1000 device for better performance. My plan is to use ethif_virtio_pci_init to initialise a eth_driver struct and then pass the struct on to picoTCP. For now I can find the virtio PCI device in sel4 but I am unsure how to correctly access it and create the ethif_virtio_pci_config_t needed for ethif_virtio_pci_init.
Some information from libethdrivers virtio_pci.h:
typedef struct ethif_virtio_pci_config {
uint16_t io_base;
void *mmio_base;
} ethif_virtio_pci_config_t;
/**
* This function initialises the hardware and conforms to the ethif_driver_init
* type in raw.h
* #param[out] eth_driver Ethernet driver structure to fill out
* #param[in] io_ops A structure containing os specific data and
* functions.
* #param[in] config Pointer to a ethif_virtio_pci_config struct
*/
int ethif_virtio_pci_init(struct eth_driver *eth_driver, ps_io_ops_t io_ops, void *config);
so for the ethif_virtio_pci_config_t I need an uint16_t io_base address and a pointer to the MMIO base.
This is the information I have obtained so far:
Found virtio_net_pci device
BASE_ADDR[0] ----
base_addr_space[0]: 0x1 [PCI_BASE_ADDRESS_SPACE_IO]
base_addr_type[0]: 0x0 [ 32bit ]
base_addr_prefetchable[0]: no
base_addr[0]: 0xc000
base_addr_size_mask[0]: 0xffffffe0
BASE_ADDR[1] ----
base_addr_space[1]: 0x0 [PCI_BASE_ADDRESS_SPACE_MEMORY]
base_addr_type[1]: 0x0 [ 32bit ]
base_addr_prefetchable[1]: no
base_addr[1]: 0xfeb91000
base_addr_size_mask[1]: 0xfffff000
BASE_ADDR[2] ----
BASE_ADDR[3] ----
BASE_ADDR[4] ----
base_addr_space[4]: 0x0 [PCI_BASE_ADDRESS_SPACE_MEMORY]
base_addr_type[4]: 0x4 [ 64bit ]
base_addr_prefetchable[4]: yes
base_addr[4]: 0xfe000000
base_addr_size_mask[4]: 0xffffc000
BASE_ADDR[5] ----
As far as I understand I now need to map the pysical address to a virtual one. For that I created an IO-mapper but I am not sure what to map. The whole dma region starting at 0x8000000 or just the address of the virtio device? As far as I understand the new virtual address would be my MMIO base pointer but what is the uint16_t io_base than?
This is my code so far, the part I am unsure about is at the end:
#define ALLOCATOR_STATIC_POOL_SIZE ((1 << seL4_LargePageBits) * 10)
static simple_t simple;
static ps_io_mapper_t io_mapper;
static char allocator_mem_pool[ALLOCATOR_STATIC_POOL_SIZE];
static vka_t vka;
static vspace_t vspace;
static sel4utils_alloc_data_t data;
static ltimer_t timer;
int main() {
PRINT_DBG("Hello World\n");
seL4_BootInfo *info = platsupport_get_bootinfo();
simple_default_init_bootinfo(&simple, info);
/* print out bootinfo and other info about simple */
// simple_print(&simple);
allocman_t *allocman = bootstrap_use_current_simple(&simple, ALLOCATOR_STATIC_POOL_SIZE, allocator_mem_pool);
if (allocman == NULL) {
ZF_LOGF("Failed to create allocman");
}
allocman_make_vka(&vka, allocman);
int error = sel4utils_bootstrap_vspace_with_bootinfo_leaky(&vspace,
&data, simple_get_pd(&simple),
&vka, info);
if (error != 0) {
PRINT_DBG("Failed to create virtual memory manager. Error: %d\n", error);
return -1;
}
error = sel4platsupport_new_io_mapper(&vspace, &vka, &io_mapper);
if (error != 0) {
PRINT_DBG("Failed to create io mapper. Error: %d\n", error);
return -1;
}
ps_io_ops_t io_ops;
error = sel4platsupport_new_io_ops(&vspace, &vka, &simple, &io_ops);
if (error != 0) {
PRINT_DBG("Failed to create io ops. Error: %d\n", error);
return -1;
}
ps_io_port_ops_t port_ops;
int error = sel4platsupport_get_io_port_ops(&port_ops, &simple, &vka);
if (error != 0) {
PRINT_DBG("Failed to find io port ops. Error: %d\n", error);
return -1;
}
printf("Start scannning\n");
libpci_scan(port_ops);
PRINT_DBG("Found %u devices\n", libpci_num_devices);
for (uint32_t i = 0; i < libpci_num_devices; ++i) {
PRINT_DBG("PCI device %u. Vendor id: %x. Device id: %x\n",
i, libpci_device_list[i].vendor_id, libpci_device_list[i].device_id);
}
libpci_device_t* virtio_net_pci = libpci_find_device(0x1af4, 0x1000);
if (!virtio_net_pci) {
PRINT_DBG("Failed to find the virtio_net_pci device\n");
// return -1;
}else{
// libpci_device_iocfg_debug_print(&virtio_net_pci->cfg,true);
PRINT_DBG("Found virtio_net_pci device\n");
libpci_device_iocfg_debug_print(&virtio_net_pci->cfg,false);
}
//Now what?
unsigned long phys = 0x8000000; //what physical address to map?
void *mmio_ptr = ps_io_map(&io_mapper, phys, 4096, 0, PS_MEM_NORMAL);
memset(ptr, 0, 4096);
if (mmio_ptr == NULL) {
PRINT_DBG("Failed to map phys addr. Error: %p\n", ptr);
return -1;
}
ethif_virtio_pci_config_t me_config;
me_config.mmio_base = mmio_ptr; //is this correct?
//me_config.io_base = ?
I read alot about the sel4 kernel but I am still new to most of the concepts of the sel4 microkernel (and Linux kernel) so I am very grateful for any tipps and recommendations. I am normally working with embedded, microcontrollers and more "bare metal" platforms and wanted to learn something new but for now alot is very confusing.

How to count the anonymous pages and shared pages for a process in Linux using kernel module

I am trying to find the resident set size of a c program running on Linux os (ubuntu 14.04). I get the PID of the running C program and pass it to a custom kernel module. The kernel module figures out the *task and extracts the *mm pointer. Then I loop through all the VM areas and in each VM area I again loop through each page aligned virtual addresses and request a page_walk(virtual addresses) to get the pte structure of type pte_t. Then I used the pte_preset() function to check the existence of the actual physical page in the RAM.
The issues I am facing are as follows:
The rss value does not match with the value shown in htop or top. Although the value I have calculated does increase proportionally as the test C program accesses more memory (using some array accessing).
I have found that the rss value of htop application gives the same result as given by the get_mm_struct() function call provided by the Linux kernel itself.
static inline unsigned long get_mm_rss(struct mm_struct *mm)
{
return get_mm_counter(mm, MM_FILEPAGES) +
get_mm_counter(mm, MM_ANONPAGES) +
get_mm_counter(mm, MM_SHMEMPAGES);
}
My query is how to count or detect these anonymous pages and shared pages? What are bits that need to be checked?
Thank You !
The correct way to do this is to realize that the count is in an array. Try:
static inline unsigned long get_mm_rss(struct mm_struct *mm)
{
int k;
unsigned long count = 0;
for(k = 0; k < NR_MM_COUNTERS; k++) {
long len = atomic_long_read(&mm->rss_stat.count[k]);
if(len < 0)
len = 0;
count += len;
}
}
Walking Physical Pages
You need to set up mm_walk struct with your call backs for pte and pmd (driven by whether or not HUGETABLES are used in the kernel) to walk through the physical pages.
For example:
show_smap uses this:
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
#ifdef CONFIG_HUGETLB_PAGE
.hugetlb_entry = smaps_hugetlb_range,
#endif
.mm = vma->vm_mm,
};
after setting up the call backs.

Page Table Walk in Linux

Consider the following piece of code:
pgd_t *pgd;
pte_t *ptep;
pud_t *pud;
pmd_t *pmd;
char *addr;
struct page *page = NULL;
struct mm_struct *mm = current->mm;
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd) || pgd_bad(*pgd))
goto out;
printk(KERN_NOTICE "Valid pgd");
pud = pud_offset(pgd, addr);
if (pud_none(*pud) || pud_bad(*pud))
goto out;
printk(KERN_NOTICE "Valid pud");
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd) || pmd_bad(*pmd))
goto out;
printk(KERN_NOTICE "Valid pmd");
ptep = pte_offset_map(pmd, addr);
if (!ptep)
goto out;
addr = ptep->pte;
printk(KERN_INFO "byte = %d\n", *(char *)__va(addr));
pte_unmap(ptep);
If I understand correctly, addr should be the physical address corresponding to the user-space virtual address. Then I should be able to dereference that using __va. However, it doesn't work. If I use pte_page and kmap, though, it works exactly how it is supposed to. Why does this happen? I'm on x86-64, so high memory shouldn't be a problem? Is there something else kmap does?
I fixed the problem thanks to TonyTannous. The page table entry doesn't contain just the physical address, but also some other bits used for access rights and the like. The physical address can be obtained by masking it with PTE_PFN_MASK:
addr = ptep->pte & PTE_PFN_MASK
I can then dereference it with __va.

PCIE region not aligned, and not consistent

I am developing a PCIE device driver for openwrt, and I met a data bus error when trying to access the io-memory in timer interrupt, which I mentioned in my last question. After lots of research I think I might have found the reason, but I am unable to solve it. Below are my troubles.
Last week I found out that the pcie region size might have changed during system startup. The region size of bar0 is 4096 in my driver (return from pci_resource_len) and the region size is 4097 in lspci -vv, which breaks the page size of linux kernel. By reading the source code of pciutil, I find that lspci command fetch the pcie information from /sys/devices/pci0000:00/0000:00:00.0/resouce file. So I remove all my custom components and run the original openwrt on my router. By cat /sys/devices/pci0000:00/0000:00:00.0/resouce, the first line of the result (bar0) is
0x0000000010008000 0x0000000010009000 0x0000000000040200
Moreover, I also check the content of /proc/iomem, and the content related to PCIE is
10000000-13ffffff : mem_base
10000000-13ffffff : PCI memory space
10000000-10007fff : 0000:00:00.0
10008000-10008fff : 0000:00:00.0
It is super weird that the region size of bar0 indicated by the two files above is different! According to the mechanism of PCIE, the region size should always be the power of 2. How come the region size becomes 4097?
After spending weeks reading source code of linux kernel, I find out that this is a bug of linux kernel 4.4.14.
The content of /sys/devices/pci0000:00/0000:00:00.0/resouce is generated through function resource_show in file drivers/pci/pci-sysfs.c. The related code is
for (i = 0; i < max; i++) {
struct resource *res = &pci_dev->resource[i];
pci_resource_to_user(pci_dev, i, res, &start, &end);
str += sprintf(str, "0x%016llx 0x%016llx 0x%016llx\n",
(unsigned long long)start,
(unsigned long long)end,
(unsigned long long)res->flags);
}
The function pci_resource_to_user actually invoked is located in arch/mips/include/asm/pci.h
static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
const struct resource *rsrc, resource_size_t *start,
resource_size_t *end)
{
phys_addr_t size = resource_size(rsrc);
*start = fixup_bigphys_addr(rsrc->start, size);
*end = rsrc->start + size;
}
The calculation of *end is wrong and should be replace by
*end = rsrc->start + size - (size ? 1 : 0)

Getting the address of a function in C?

I'm getting a segfault when running this code as root in userspace. I don't understand why. I believe I have a rootkit and I want to check if the addresses are the same as the ones as in /boot/System.map-3.2.0-4-amd64
unsigned long hex;
unsigned long **sys_call_table;
for(hex = 0xffffffff810f8989; hex < 0xffffffff8160e370; hex += sizeof(void *))
{
sys_call_table = (unsigned long **)hex;
if(sys_call_table[3] == (unsigned long *)0xffffffff810f8989)
{
puts("sys_close's address has not been replaced by the rootkit");
}
}
cat /boot/System.map-3.2.0-4-amd64 | grep "string you want"
ffffffff81401200 R sys_call_table
ffffffff810f9f9e T sys_read // sys_call_table[0]
ffffffff810fa009 T sys_write // sys_call_table[1]
ffffffff810f950d T sys_open // sys_call_table[2]
ffffffff810f8989 T sys_close // sys_call_table[3]
ffffffff8160e370 D loops_per_jiffy
Running from root is not enough - the problem is that you run it in user space - run it in the kernel space, as a kernel module, for example. Although having root privileges is enough for invoking system calls you cannot access the table - in user space you can only access allocated memory to you.

Resources