Get maximum available register for Linux PCI device - c

I am currently debugging a Linux kernel driver.
I want to sweep a PCI device's mmio registers to scan for certain information.
This is the function I wrote so far.
void _sweep_registers(struct pci_dev *dev)
{
int i;
int activecontrolstatus;
int activestatus;
for (i = 0; i < AMD_P2C_MSG_INTSTS; i++) {
activecontrolstatus = readl(privdata->mmio + i);
activestatus = activecontrolstatus >> 4;
dev_info(&dev->dev, "activecontrolstatus = %d / activestatus = %d",
activecontrolstatus, activestatus);
}
}
Currently I am reading mmio until what's specified in AMD_P2C_MSG_INTSTS (which is 0x10694).
But how far can I actually go?
I have zero knowledge of Linux kernel development and only rudimentary knowledge of C.
Background
My goal is to find information about which sensors of the AMD Sensor Fusion Hub are marked as active.
They should be under the register 0x1068C, but are not on my system (it's 0x0, but at least an accelerometer is available, so the bitmask should at least match 0x1).
I want to see, whether they are stored somewhere else.

Related

How does Qemu emulate PCIe devices?

I'm writing an open source document about qemu internals so if you help me you're helping the growth of Qemu project
The closest answer I found was: In which conditions the ioctl KVM_RUN returns?
This is the thread loop for a single CPU running on KVM:
static void *qemu_kvm_cpu_thread_fn(void *arg)
{
CPUState *cpu = arg;
int r;
rcu_register_thread();
qemu_mutex_lock_iothread();
qemu_thread_get_self(cpu->thread);
cpu->thread_id = qemu_get_thread_id();
cpu->can_do_io = 1;
current_cpu = cpu;
r = kvm_init_vcpu(cpu);
if (r < 0) {
error_report("kvm_init_vcpu failed: %s", strerror(-r));
exit(1);
}
kvm_init_cpu_signals(cpu);
/* signal CPU creation */
cpu->created = true;
qemu_cond_signal(&qemu_cpu_cond);
qemu_guest_random_seed_thread_part2(cpu->random_seed);
do {
if (cpu_can_run(cpu)) {
r = kvm_cpu_exec(cpu);
if (r == EXCP_DEBUG) {
cpu_handle_guest_debug(cpu);
}
}
qemu_wait_io_event(cpu);
} while (!cpu->unplug || cpu_can_run(cpu));
qemu_kvm_destroy_vcpu(cpu);
cpu->created = false;
qemu_cond_signal(&qemu_cpu_cond);
qemu_mutex_unlock_iothread();
rcu_unregister_thread();
return NULL;
}
You can see here:
do {
if (cpu_can_run(cpu)) {
r = kvm_cpu_exec(cpu);
if (r == EXCP_DEBUG) {
cpu_handle_guest_debug(cpu);
}
}
qemu_wait_io_event(cpu);
} while (!cpu->unplug || cpu_can_run(cpu));
that every time the KVM returns, it gives an opportunity for Qemu to emulate things. I suppose that when the kernel on the guest tries to access a PCIe device, KVM on the host returns. I don't know how KVM knows how to return. Maybe KVM maintains the addresses of the PCIe device and tells Intel's VT-D or AMD's IOV which addresses should generate an exception. Can someone clarify this?
Well, by the look of the qemu_kvm_cpu_thread_fn, the only place where a PCIe access could be emulated, is qemu_wait_io_event(cpu), which is defined here: https://github.com/qemu/qemu/blob/stable-4.2/cpus.c#L1266 and which calls qemu_wait_io_event_common defined here: https://github.com/qemu/qemu/blob/stable-4.2/cpus.c#L1241 which calls process_queued_cpu_work defined here: https://github.com/qemu/qemu/blob/stable-4.2/cpus-common.c#L309
Let's see the code which executes the queue functions:
while (cpu->queued_work_first != NULL) {
wi = cpu->queued_work_first;
cpu->queued_work_first = wi->next;
if (!cpu->queued_work_first) {
cpu->queued_work_last = NULL;
}
qemu_mutex_unlock(&cpu->work_mutex);
if (wi->exclusive) {
/* Running work items outside the BQL avoids the following deadlock:
* 1) start_exclusive() is called with the BQL taken while another
* CPU is running; 2) cpu_exec in the other CPU tries to takes the
* BQL, so it goes to sleep; start_exclusive() is sleeping too, so
* neither CPU can proceed.
*/
qemu_mutex_unlock_iothread();
start_exclusive();
wi->func(cpu, wi->data);
It looks like that the only power the VCPU thread qemu_kvm_cpu_thread_fn has when KVM returns, is to execute the queued functions:
wi->func(cpu, wi->data);
This means that a PCIe device would have to constantly queue itself as a function for qemu to execute. I don't see how it would work.
The functions that are able to queue work on this cpu have run_on_cpu on its name. By searching it on VSCode I found some functions that queue work but none related to PCIe or even emulation. The nicest function I found was this one that apparently patches instructions: https://github.com/qemu/qemu/blob/stable-4.2/hw/i386/kvmvapic.c#L446. Nice, I wanted to know that also.
Device emulation (of all devices, not just PCI) under KVM gets handled by the "case KVM_EXIT_IO" (for x86-style IO ports) and "case KVM_EXIT_MMIO" (for memory mapped IO including PCI) in the "switch (run->exit_reason)" inside kvm_cpu_exec(). qemu_wait_io_event() is unrelated.
Want to know how execution gets to "emulate a register read on a PCI device" ? Run QEMU under gdb, set a breakpoint on, say, the register read/write function for the ethernet PCI card you're using, and then when you get dropped into the debugger look at the stack backtrace. (Compile QEMU --enable-debug to get better debug info for this kind of thing.)
PS: If you're examining QEMU internals for educational purposes, you'd be better to use the current code, not a year-old release of it.

How to count the anonymous pages and shared pages for a process in Linux using kernel module

I am trying to find the resident set size of a c program running on Linux os (ubuntu 14.04). I get the PID of the running C program and pass it to a custom kernel module. The kernel module figures out the *task and extracts the *mm pointer. Then I loop through all the VM areas and in each VM area I again loop through each page aligned virtual addresses and request a page_walk(virtual addresses) to get the pte structure of type pte_t. Then I used the pte_preset() function to check the existence of the actual physical page in the RAM.
The issues I am facing are as follows:
The rss value does not match with the value shown in htop or top. Although the value I have calculated does increase proportionally as the test C program accesses more memory (using some array accessing).
I have found that the rss value of htop application gives the same result as given by the get_mm_struct() function call provided by the Linux kernel itself.
static inline unsigned long get_mm_rss(struct mm_struct *mm)
{
return get_mm_counter(mm, MM_FILEPAGES) +
get_mm_counter(mm, MM_ANONPAGES) +
get_mm_counter(mm, MM_SHMEMPAGES);
}
My query is how to count or detect these anonymous pages and shared pages? What are bits that need to be checked?
Thank You !
The correct way to do this is to realize that the count is in an array. Try:
static inline unsigned long get_mm_rss(struct mm_struct *mm)
{
int k;
unsigned long count = 0;
for(k = 0; k < NR_MM_COUNTERS; k++) {
long len = atomic_long_read(&mm->rss_stat.count[k]);
if(len < 0)
len = 0;
count += len;
}
}
Walking Physical Pages
You need to set up mm_walk struct with your call backs for pte and pmd (driven by whether or not HUGETABLES are used in the kernel) to walk through the physical pages.
For example:
show_smap uses this:
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
#ifdef CONFIG_HUGETLB_PAGE
.hugetlb_entry = smaps_hugetlb_range,
#endif
.mm = vma->vm_mm,
};
after setting up the call backs.

PCIE region not aligned, and not consistent

I am developing a PCIE device driver for openwrt, and I met a data bus error when trying to access the io-memory in timer interrupt, which I mentioned in my last question. After lots of research I think I might have found the reason, but I am unable to solve it. Below are my troubles.
Last week I found out that the pcie region size might have changed during system startup. The region size of bar0 is 4096 in my driver (return from pci_resource_len) and the region size is 4097 in lspci -vv, which breaks the page size of linux kernel. By reading the source code of pciutil, I find that lspci command fetch the pcie information from /sys/devices/pci0000:00/0000:00:00.0/resouce file. So I remove all my custom components and run the original openwrt on my router. By cat /sys/devices/pci0000:00/0000:00:00.0/resouce, the first line of the result (bar0) is
0x0000000010008000 0x0000000010009000 0x0000000000040200
Moreover, I also check the content of /proc/iomem, and the content related to PCIE is
10000000-13ffffff : mem_base
10000000-13ffffff : PCI memory space
10000000-10007fff : 0000:00:00.0
10008000-10008fff : 0000:00:00.0
It is super weird that the region size of bar0 indicated by the two files above is different! According to the mechanism of PCIE, the region size should always be the power of 2. How come the region size becomes 4097?
After spending weeks reading source code of linux kernel, I find out that this is a bug of linux kernel 4.4.14.
The content of /sys/devices/pci0000:00/0000:00:00.0/resouce is generated through function resource_show in file drivers/pci/pci-sysfs.c. The related code is
for (i = 0; i < max; i++) {
struct resource *res = &pci_dev->resource[i];
pci_resource_to_user(pci_dev, i, res, &start, &end);
str += sprintf(str, "0x%016llx 0x%016llx 0x%016llx\n",
(unsigned long long)start,
(unsigned long long)end,
(unsigned long long)res->flags);
}
The function pci_resource_to_user actually invoked is located in arch/mips/include/asm/pci.h
static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
const struct resource *rsrc, resource_size_t *start,
resource_size_t *end)
{
phys_addr_t size = resource_size(rsrc);
*start = fixup_bigphys_addr(rsrc->start, size);
*end = rsrc->start + size;
}
The calculation of *end is wrong and should be replace by
*end = rsrc->start + size - (size ? 1 : 0)

Creating File in arduino's Memory while arduino is operating

In my arduino project i have to store some integers(25 to be specific) in a file in arduino's memory (as i have arduino UNO and it doesn't have built-in port for SD Card) and read that file next-time i start the arduino .
Also my arduino is not connected to PC or laptop so i can't use file system of PC or laptop
so is there any way possible doing it ?
Arduino Uno has 1KB of non-volatile EEPROM memory, which you can use for this purpose. An integer is 2 bytes, so you should be able to store over 500 ints this way.
This example sketch should write a couple of integers from 10 to 5 into EEPROM memory:
#include <EEPROM.h>
void setup() {
int address = 0; //Location we want the data to be put.
for (int value = 10; value >= 5; --value)
{
// Write the int at address
EEPROM.put(eeAddress, value)
// Move the address, so the next value will be written after the first.
address += sizeof(int);
}
}
void loop() {
}
This example is a stripped down version of the one in the EEPROM.put documentation. Other examples can be found in the documentation of the various EEPROM functions.
Another nice tutorial on the subject can be found on tronixstuff.
By the way, if you need more memory, you could also use EEPROM memory banks. These are small ICs. They are available for very low prices in low amounts of memory, typically from 1KB to 256KB. Not much in terms of modern computing, but a huge expansion compared to the 1KB you have by default.

CPU write value passed from application to qemu is strange

I was trying to run RTEMS(a real-time OS) application on a sparc virtual machine using QEMU.
I'm almost there and I've seen it working hours ago. But after removing some prints it is not working and later I found it's not because of the removed prints. The data is not being passed correctly between the RTEMS image and the QEMU emulation model.(I'm working with QEMU version 1.5.50 and lan9118.c model borrowed from QEMU version 2.0.0. I modifed lan9118 a little.)
In the QEMU model, the memory region ops are defined as
struct MemoryRegionOps {
/* Read from the memory region. #addr is relative to #mr; #size is
* in bytes. */
uint64_t (*read)(void *opaque,
hwaddr addr,
unsigned size);
/* Write to the memory region. #addr is relative to #mr; #size is
* in bytes. */
void (*write)(void *opaque,
hwaddr addr,
uint64_t data,
unsigned size);
...
}
and in the RTEMS application, I write to the device like
*TX_FIFO_PORT = cmdA;
*TX_FIFO_PORT = cmdB;
where TX_FIFO_PORT is defined as below.
#define TX_FIFO_PORT (volatile ulong *)(SMSC9118_BASE + 0x20)
But when I write, for example,
cmdA : 0x2a300200 and cmdB : 0x2a002a00,
The values I expected are
cmdA : 0x0002302a and cmdB : 0x002a002a. (Just endian converted values)
But the values I see at the write function (entrance of QEMU) are
cmdA : 0x02000200 and cmdB : 0x2a002a00 respectively.
The observed values have not been endian converted and even the first value is different(lower 16 bit repeated).
What could be problem?
Any hint will be deeply appreciated.
Strangely I fixed this by commenting out the endian conversion for cmdA and cmdB in the RTEMS before writing to the device.(It was ok with the endian conversion..I don't know) So it's working 'almost'.
Anyway, here is a tip about exchaning CPU write/read data in QEMU processor and deivce.
In QEMU, Each device model provides write and read function, also it specifies how the word should be transferd to/from the device regarding endianness. It is specified like below.
static const MemoryRegionOps lan9118_mem_ops = {
.read = lan9118_readl,
.write = lan9118_writel,
.endianness = DEVICE_NATIVE_ENDIAN,
};
Here is the copy from email I received from Peter Maydell from qemu-discuss#nongnu.org mailing list.
------------------------
This depends on what the MemoryRegionOps struct for the memory region sets its .endianness field to.
DEVICE_NATIVE_ENDIAN means the device sees values the same way round as the guest CPU's native endianness[*], so if the guest does a 32 bit write of 0x12345678 then it appears in the write function's argument as 0x12345678. DEVICE_BIG_ENDIAN means that if the CPU is little endian then the word will be byteswapped.
DEVICE_LITTLE_ENDIAN means that if the CPU is big endian then the word will be byteswapped. The latter are useful for devices or buses which have a specific endianness which is not the same as that of the CPU (eg PCI is always little endian).

Resources