Page Table Walk in Linux - c

Consider the following piece of code:
pgd_t *pgd;
pte_t *ptep;
pud_t *pud;
pmd_t *pmd;
char *addr;
struct page *page = NULL;
struct mm_struct *mm = current->mm;
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd) || pgd_bad(*pgd))
goto out;
printk(KERN_NOTICE "Valid pgd");
pud = pud_offset(pgd, addr);
if (pud_none(*pud) || pud_bad(*pud))
goto out;
printk(KERN_NOTICE "Valid pud");
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd) || pmd_bad(*pmd))
goto out;
printk(KERN_NOTICE "Valid pmd");
ptep = pte_offset_map(pmd, addr);
if (!ptep)
goto out;
addr = ptep->pte;
printk(KERN_INFO "byte = %d\n", *(char *)__va(addr));
pte_unmap(ptep);
If I understand correctly, addr should be the physical address corresponding to the user-space virtual address. Then I should be able to dereference that using __va. However, it doesn't work. If I use pte_page and kmap, though, it works exactly how it is supposed to. Why does this happen? I'm on x86-64, so high memory shouldn't be a problem? Is there something else kmap does?

I fixed the problem thanks to TonyTannous. The page table entry doesn't contain just the physical address, but also some other bits used for access rights and the like. The physical address can be obtained by masking it with PTE_PFN_MASK:
addr = ptep->pte & PTE_PFN_MASK
I can then dereference it with __va.

Related

Correct procedure and memory addresses to setup a virtio-net ethernet device on a sel4 microkernel

In short:
I am trying to run the sel4 microkernel inside a x86_64 virtual machine and can't get the ethernet interface working.
What is the correct procedure to get internet connectivity (via a vitio-net ethernet device) on a sel4 microkernel? And what are the correct (memory) addresses?
Long version:
I have tried the camkes (picoserver) examples with the e1000 netdevice but couldn't get them to work so I decided to learn some new things and start from scratch. Also I decided to use virtio-net(together with vhost) instead of an emulated e1000 device for better performance. My plan is to use ethif_virtio_pci_init to initialise a eth_driver struct and then pass the struct on to picoTCP. For now I can find the virtio PCI device in sel4 but I am unsure how to correctly access it and create the ethif_virtio_pci_config_t needed for ethif_virtio_pci_init.
Some information from libethdrivers virtio_pci.h:
typedef struct ethif_virtio_pci_config {
uint16_t io_base;
void *mmio_base;
} ethif_virtio_pci_config_t;
/**
* This function initialises the hardware and conforms to the ethif_driver_init
* type in raw.h
* #param[out] eth_driver Ethernet driver structure to fill out
* #param[in] io_ops A structure containing os specific data and
* functions.
* #param[in] config Pointer to a ethif_virtio_pci_config struct
*/
int ethif_virtio_pci_init(struct eth_driver *eth_driver, ps_io_ops_t io_ops, void *config);
so for the ethif_virtio_pci_config_t I need an uint16_t io_base address and a pointer to the MMIO base.
This is the information I have obtained so far:
Found virtio_net_pci device
BASE_ADDR[0] ----
base_addr_space[0]: 0x1 [PCI_BASE_ADDRESS_SPACE_IO]
base_addr_type[0]: 0x0 [ 32bit ]
base_addr_prefetchable[0]: no
base_addr[0]: 0xc000
base_addr_size_mask[0]: 0xffffffe0
BASE_ADDR[1] ----
base_addr_space[1]: 0x0 [PCI_BASE_ADDRESS_SPACE_MEMORY]
base_addr_type[1]: 0x0 [ 32bit ]
base_addr_prefetchable[1]: no
base_addr[1]: 0xfeb91000
base_addr_size_mask[1]: 0xfffff000
BASE_ADDR[2] ----
BASE_ADDR[3] ----
BASE_ADDR[4] ----
base_addr_space[4]: 0x0 [PCI_BASE_ADDRESS_SPACE_MEMORY]
base_addr_type[4]: 0x4 [ 64bit ]
base_addr_prefetchable[4]: yes
base_addr[4]: 0xfe000000
base_addr_size_mask[4]: 0xffffc000
BASE_ADDR[5] ----
As far as I understand I now need to map the pysical address to a virtual one. For that I created an IO-mapper but I am not sure what to map. The whole dma region starting at 0x8000000 or just the address of the virtio device? As far as I understand the new virtual address would be my MMIO base pointer but what is the uint16_t io_base than?
This is my code so far, the part I am unsure about is at the end:
#define ALLOCATOR_STATIC_POOL_SIZE ((1 << seL4_LargePageBits) * 10)
static simple_t simple;
static ps_io_mapper_t io_mapper;
static char allocator_mem_pool[ALLOCATOR_STATIC_POOL_SIZE];
static vka_t vka;
static vspace_t vspace;
static sel4utils_alloc_data_t data;
static ltimer_t timer;
int main() {
PRINT_DBG("Hello World\n");
seL4_BootInfo *info = platsupport_get_bootinfo();
simple_default_init_bootinfo(&simple, info);
/* print out bootinfo and other info about simple */
// simple_print(&simple);
allocman_t *allocman = bootstrap_use_current_simple(&simple, ALLOCATOR_STATIC_POOL_SIZE, allocator_mem_pool);
if (allocman == NULL) {
ZF_LOGF("Failed to create allocman");
}
allocman_make_vka(&vka, allocman);
int error = sel4utils_bootstrap_vspace_with_bootinfo_leaky(&vspace,
&data, simple_get_pd(&simple),
&vka, info);
if (error != 0) {
PRINT_DBG("Failed to create virtual memory manager. Error: %d\n", error);
return -1;
}
error = sel4platsupport_new_io_mapper(&vspace, &vka, &io_mapper);
if (error != 0) {
PRINT_DBG("Failed to create io mapper. Error: %d\n", error);
return -1;
}
ps_io_ops_t io_ops;
error = sel4platsupport_new_io_ops(&vspace, &vka, &simple, &io_ops);
if (error != 0) {
PRINT_DBG("Failed to create io ops. Error: %d\n", error);
return -1;
}
ps_io_port_ops_t port_ops;
int error = sel4platsupport_get_io_port_ops(&port_ops, &simple, &vka);
if (error != 0) {
PRINT_DBG("Failed to find io port ops. Error: %d\n", error);
return -1;
}
printf("Start scannning\n");
libpci_scan(port_ops);
PRINT_DBG("Found %u devices\n", libpci_num_devices);
for (uint32_t i = 0; i < libpci_num_devices; ++i) {
PRINT_DBG("PCI device %u. Vendor id: %x. Device id: %x\n",
i, libpci_device_list[i].vendor_id, libpci_device_list[i].device_id);
}
libpci_device_t* virtio_net_pci = libpci_find_device(0x1af4, 0x1000);
if (!virtio_net_pci) {
PRINT_DBG("Failed to find the virtio_net_pci device\n");
// return -1;
}else{
// libpci_device_iocfg_debug_print(&virtio_net_pci->cfg,true);
PRINT_DBG("Found virtio_net_pci device\n");
libpci_device_iocfg_debug_print(&virtio_net_pci->cfg,false);
}
//Now what?
unsigned long phys = 0x8000000; //what physical address to map?
void *mmio_ptr = ps_io_map(&io_mapper, phys, 4096, 0, PS_MEM_NORMAL);
memset(ptr, 0, 4096);
if (mmio_ptr == NULL) {
PRINT_DBG("Failed to map phys addr. Error: %p\n", ptr);
return -1;
}
ethif_virtio_pci_config_t me_config;
me_config.mmio_base = mmio_ptr; //is this correct?
//me_config.io_base = ?
I read alot about the sel4 kernel but I am still new to most of the concepts of the sel4 microkernel (and Linux kernel) so I am very grateful for any tipps and recommendations. I am normally working with embedded, microcontrollers and more "bare metal" platforms and wanted to learn something new but for now alot is very confusing.

Accessing memory loaded via tftp

I am trying to access data from a file that I load in via tftp. I'm using an AM3358 processor
tftp 81000000 mydata
and I can see the data being correctly loaded
=> md 81000000
81000000: 00004000 00000000 00002000 00000400 .#....... ......
In the u-boot code, I create a pointer to this address and then attempt to de-reference it, but the value is incorrect which makes me think I'm using the incorrect address
unsigned long addr = 0x81000000;
uint32_t *ptr = &addr;
uint32_t val = *(ptr+0);
printf("addr %ul val: %ul", addr, val);
Furthermore, I'm trying to load the address of mydata into a 32-bit LCD register, but the physical address of 0x81000000 is beyond that of a 32-bit number. I believe I'm just confused as to what address mapping is involved here.
bdi yields
=> bdi
arch_number = 0x00000000
boot_params = 0x80000100
DRAM bank = 0x00000000
-> start = 0x80000000
-> size = 0x20000000
baudrate = 115200 bps
TLB addr = 0x9fff0000
relocaddr = 0x9ffb4000
reloc off = 0x1f7b4000
irq_sp = 0x9df8ba90
sp start = 0x9df8ba80
Early malloc usage: 4a8 / 1000
fdt_blob = 0x9df8bea0
Why would 0x81000000 not be a valid 32 bit number ?
0x00000000 <= 0x81000000 <= 0xFFFFFFFF.
I think there may be an error in your logic: you are initializing ptrwith the address of addr, not the address of its content.
The correct code would rather be something like:
uint32_t addr = 0x81000000;
uint32_t *ptr = (uint32_t*)(uintptr_t) addr;
uint32_t val = *(ptr+0);
printf("addr %ul val: %ul", addr, val);
This can be tested on your PC - you may need to add support for building 32 bit applications, i.e. execute sudo apt-get install gcc-multilib on Ubuntu.
ptr.c:
#include <stdio.h>
#include <stdint.h>
int
main ()
{
uint32_t addr = 0x81000000; // gcc 9.3.0 is complaining about uint32_t *ptr = &addr;
// your code
{
uint32_t *ptr = &addr;
printf ("%p\n", ptr);
}
// correct code
{
uint32_t *ptr = (uint32_t *) (uintptr_t) addr;
printf ("%p\n", ptr);
}
}
gcc -m32 -o ptr ptr.c
./ptr
0xff83bc60
0x81000000
This would be why you cannot access the content of the file you transferred using TFTP, you are reading from an incorrect, but valid address.

ip_rcv (in ip_input.c for ipv4) behaviour in loopback mode

So I wanted to put a few printk messages in ip_rcv function to see whether after receiving a packet from a particular IP there is a message printed. I am attaching the entire ip_rcv function which has the printk modifications:
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
{
const struct iphdr *iph;
struct net *net;
u32 len;
/* When the interface is in promisc. mode, drop all the crap
* that it receives, do not try to analyse it.
*/
if (skb->pkt_type == PACKET_OTHERHOST)
goto drop;
net = dev_net(dev);
__IP_UPD_PO_STATS(net, IPSTATS_MIB_IN, skb->len);
skb = skb_share_check(skb, GFP_ATOMIC);
if (!skb) {
__IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS);
goto out;
}
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
goto inhdr_error;
iph = ip_hdr(skb);
//**PSK's Modification**
if (iph->saddr == 0x08080808)
printk("\n***PSK: %x IP's message recieved: Google***\n", iph->saddr);
if (iph->saddr == 0x0202000A)
printk("\n***PSK: %x IP's message recieved: Gateway***\n", iph->saddr);
if (iph->saddr == 0x010000FF)
printk("\n***PSK: %x IP's message recieved : Home***\n", iph->saddr);
/* RFC1122: 3.2.1.2 MUST silently discard any IP frame that fails the checksum.
*
* Is the datagram acceptable?
*
* 1. Length at least the size of an ip header
* 2. Version of 4
* 3. Checksums correctly. [Speed optimisation for later, skip loopback checksums]
* 4. Doesn't have a bogus length
*/
if (iph->ihl < 5 || iph->version != 4)
goto inhdr_error;
BUILD_BUG_ON(IPSTATS_MIB_ECT1PKTS != IPSTATS_MIB_NOECTPKTS + INET_ECN_ECT_1);
BUILD_BUG_ON(IPSTATS_MIB_ECT0PKTS != IPSTATS_MIB_NOECTPKTS + INET_ECN_ECT_0);
BUILD_BUG_ON(IPSTATS_MIB_CEPKTS != IPSTATS_MIB_NOECTPKTS + INET_ECN_CE);
__IP_ADD_STATS(net,
IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));
if (!pskb_may_pull(skb, iph->ihl*4))
goto inhdr_error;
iph = ip_hdr(skb);
if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
goto csum_error;
len = ntohs(iph->tot_len);
if (skb->len < len) {
__IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS);
goto drop;
} else if (len < (iph->ihl*4))
goto inhdr_error;
/* Our transport medium may have padded the buffer out. Now we know it
* is IP we can trim to the true length of the frame.
* Note this now means skb->len holds ntohs(iph->tot_len).
*/
if (pskb_trim_rcsum(skb, len)) {
__IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS);
goto drop;
}
iph = ip_hdr(skb);
skb->transport_header = skb->network_header + iph->ihl*4;
/* Remove any debris in the socket control block */
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
IPCB(skb)->iif = skb->skb_iif;
/* Must drop socket now because of tproxy. */
skb_orphan(skb);
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
net, NULL, skb, dev, NULL,
ip_rcv_finish);
csum_error:
__IP_INC_STATS(net, IPSTATS_MIB_CSUMERRORS);
inhdr_error:
__IP_INC_STATS(net, IPSTATS_MIB_INHDRERRORS);
drop:
kfree_skb(skb);
out:
return NET_RX_DROP;
}
I should get a message printed into the kernel buffer after getting a packet from either Google DNS, my local gateway (10.0.2.2) or loop-back address (127.0.0.1). This is working fine for the DNS and the gateway, but not when I ping localhost or try to run a program which involves back and forth from a localhost. Are there some other kernel function calls that specifically handle packets to localhost, or am I missing something very basic? I thought the loopback packets should trace the same path through the stack as any other packet at least until L3. I would also appreciate if someone briefly explains the handling of loopback traffic along with the answer.
System Specs:
System- Ubuntu on Virtual Machine
Kernel- 4.15.0-70 generic
I think there is something wrong here:
if (iph->saddr == 0x010000FF)
Perhaps you mean:
if (iph->saddr == 0x0100007F)
loopback packets should trace the same path
Yep, in general.
Investigate more about the details of loopback device.
Also you always can operate with some useful tools like trace-cmd. F.e. to see the function graph you can do:
trace-cmd record -p function_graph -g net_rx_action
Then start ping 127.0.0.1, then stop tracing and watch report like
trace-cmd report | vim -
Here you can see the "path" and ensure that your localhost pinging eventually falls into ip_rcv().

Dumping the pfn from /proc/<pid>/pagemap does not give the expected content

I'm using this code http://fivelinesofcode.blogspot.com/2014/03/how-to-translate-virtual-to-physical.html to dump the pfn related to a given virtual address taken from /proc/"pid"/maps.
Once I get the PFN, I dump it with a specific kernel module. This is a snippet of the code:
static int write_pfn(phys_addr_t pfn)
{
struct page *p;
void *v;
int s =0,ret =0;
p = pfn_to_page((pfn) >> PAGE_SHIFT);
v = kmap(p);
DBG("Writing page %d(mapped addr=0x%lx) - pfn: 0x%lx", p,v,pfn);
s = write_vaddr(v, PAGE_SIZE);
if (s != PAGE_SIZE) {
DBG("Error sending page %d(addr=0x%lx)", s,v);
return (int) s;
ret-=1;
}
kunmap(p);
return ret;
}
However, I have noticed that if I compare the PFN dumped with the kernel module with the real content of the corresponding virtual address inside the process , than the content is completely different.
Note that I dump the content of the process virtual address using the command "x" from (gdb).
Any idea? This is my kernel version:
Linux 3.14.7-rt5 #1 SMP Mon Jun 23 14:55:19 CEST 2014 x86_64 GNU/Linux
Logically your solution is correct, but i'am not sure about correctness of the pagemap reader code. Anyway, there is a way to check it. While you are inside kernel (in the context of target application) you could get page corresponding to specified VA using this kernel API:
1. mm = current->mm
2. struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
3. static inline struct page *follow_page(struct vm_area_struct *vma,
unsigned long address, unsigned int foll_flags)
4. page_to_pfn
You can implement this and compare with your results, Hope this will help.
Thank you Alex.
The problem with your solution is that some symbols are not exported (e.g. follow_page()) therefore I can't use them in my module.
However, I found this solution:
mm = ts->mm;
pgd_t * pgd = pgd_offset(mm, vaddr);
pud_t * pud = pud_offset(pgd, vaddr);
pmd_t * pmd = pmd_offset(pud, vaddr);
pte_t * pte = pte_offset_map(pmd, vaddr);
p = pte_page(*pte);
if(p)
DBG("page frame struct is # %p", p);
v = kmap(p);
DBG("Writing page 0x%lx(mapped addr=0x%lx) - pid: %d", v,vaddr,pidnr);
s = write_vaddr(v, PAGE_SIZE);
if (s != PAGE_SIZE) {
DBG("Error sending page %d(addr=0x%lx pid=%d)", v,vaddr,pidnr);
ret-=1;
}
kunmap(p);
pte_unmap(pte);

print strings from inside linux ntworking driver

I have written a linux networking driver.
This is my "hard_header" function:
int snull_header(struct sk_buff *skb, struct net_device *dev,
unsigned short type, void *daddr, void *saddr,
unsigned int len)
{
struct ethhdr *eth = (struct ethhdr *)skb_push(skb,ETH_HLEN);
pr_err("inside snull_header\n");
pr_err("THE DATA TO SEND BEFORE ADDITION IS:%s\n", skb->data);
pr_err("THE SOURCE IS:%s\n", (char*)saddr);
pr_err("THE DEST IS:%s\n", (char*)daddr);
eth->h_proto = htons(type);
memcpy(eth->h_source, saddr ? saddr : dev->dev_addr, dev->addr_len);
memcpy(eth->h_dest, daddr ? daddr : dev->dev_addr, dev->addr_len);
eth->h_dest[ETH_ALEN-1] ^= 0x01; /* dest is us xor 1 */
pr_err("THE DATA TO SEND AFTER ADDITION IS:%s\n", skb->data);
return (dev->hard_header_len);
}
This is the definition of pr_err (from printk.h):
#define pr_err(fmt, ...) \
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
When I run load this driver and try to send packet, I see all the prints, but instead of the strings of skb->data, source and destination I see gibberish.
My guess is that it's related somehow to the fact I'm referring to kernel memory, but on the other hand, this is what printk is for.
How can I print correctly these strings?
For the saddr and daddr, you can not print them in this way. You can print them with:
pr_err("THE SOURCE IS:%d.%d.%d.%d\n",
(0xff & saddr),
(0xff00 & saddr) >> 8,
(0xff000000 & saddr) >> 16
(0xff000000 & saddr) >>24);
For the skb->data, it's not null('\0') terminated so you can not printed as string with the format "%s". the limit of the skb->data determined by skb->len. You can print the content of the skb->data in this way.
int i;
for (i=0; i<skb->len; i++)
printk("%c", skb->data+i );
The contents of skb->data vary depending on which networking layer you are printing it. It can contain the header of previous layers even. So get the length of the data. And then print the data bytewise for the length. Then analyse what it contains. skb->data_len gives the size of data.

Resources