I am writing a device driver in linux for PCIe endpoint implemented on Xilinx UltrascaleMPSoC FPGA part. I have implemented the remove function correctly. I connect my device using an adapter to my PC, turn on device, enable its Endpoint and then turn on PC and everything works correctly. However, when I try to unload the driver module using rmmod command, the process hangs.
I went through Linux documentation and for pci_disable_device() documentation, it says
Note we don't actually disable the device until all callers of pci_enable_device() have called pci_disable_device().
Does it mean, my driver has to wait untill every other pcie driver in linux calls pci_disable_device() and only then device gets disabled? I really doubt upon this :(
I tried using modprobe -r and also listed the module usage count using "lsmod". lsmod shows usage count as "0". But still I am not able to unload the module :( I also added print statements.
void remove(struct pci_dev pdev)
{
pci_unmap_single(pdev, privdata->dma_mem,
PAGE_SIZE * (1 << memorder),
PCI_DMA_FROMDEVICE);
printk(KERN_INFO"unmap_single() complete\n");
free_pages ((unsigned long) privdata->mem, memorder);
printk(KERN_INFO"free_pages() complete\n");
free_irq(pdev->irq, privdata);
printk(KERN_INFO"free_irq() complete\n");
pci_disable_msi(pdev);
printk(KERN_INFO"MSI disable complete\n");
pci_clear_master(pdev); /* Nobody seems to do this */
printk(KERN_INFO"clear_master() complete\n");
pci_iounmap(pdev, privdata->registers);
printk(KERN_INFO"iounmap() complete\n");
pci_disable_device(pdev);
printk(KERN_INFO"disable_device() complete\n");
pci_release_regions(pdev);
printk(KERN_INFO"release_regions() complete\n");
}
Expected: The device must get disabled. I can't conclude. When I do a dmesg in another terminal, I get prints till "disable_device() complete" and the terminal hangs.Also,I checked the files: /proc/iomem and /proc/interrupts-> when I unload the module, corresponding entries of my device is removed in both of my files! I only complete the execution of pci_disable_device() and the process hangs.
Note: Only the rmmod process hangs and I cannot use the current terminal, but I can open another terminal, do all things.
Finally, when the do a restart, the system hangs again and everytime, I am doing force restart by holding power button for 10-15 seconds.
Check this reboot_hang
Why is the pci_release_regions() hanging my system even though lsmod shows my module usage count as "0".
Thanks in advance :)
Also, should I interchange pci_disable_device() and pci_release_regions() methods in above code? -> I tried this as well, but then I get prints till release_regions() complete but disable_device complete is not printed. And process hangs :(
Please help
Difficult to answer without more information on what you really use on your driver.
But, I will try to give you an answer.
Does it mean, my driver has to wait untill every other pcie driver in linux calls pci_disable_device() and only then device gets disabled?
No, this would make impractical and not very modular.
It seems you're freeing the DMA pages. How are you allocating them ? Are you using coherent allocation? Take a look here to see how to use properly the DMA.
For the remainder, can you try to move release regions before clear_master? Example :
pci_disable_msi(pdev);
ioummap(ADDRESS);
pci_release_regions(pdev);
pci_clear_master(pdev);
pci_disable_device(pdev);
pci_set_drvdata(pdev, NULL);
For more information on how to write PCI Drivers, check the documentation here. It provides a lot of information.
Related
I am trying to initiate a spi communication between an omap processor an sam4l one. I have configured spi protocol and omap is the master. Now what I see is the test data I am sending is correctly reaching on sam4l and I can see the isr is printing that data. Using more printf here and there in isr makes the operation happen and the respective operation happens, but if I remove all printfs I can't see any operation happening. What can be the cause of this anomaly? Is it a usual case of wrong frequency settings or something?
If code is needed I will post that too but its big.
Thanks
I think you are trying to print message in driver.
As printing message on console with slow down your driver, it may behave slowly and your driver work well.
Use pr_info() for debug and change setting to not come message on console by editing /proc/sys/kernel/printk to 4 4 1 7
-> It will store debug message in buffer.
-> Driver not slow down because of printing message on screen.
-> And you can see it by typing dmesg command later.
Then find orignal problem which may cause error.
If a routine works with printf "here and there" and not otherwise, almostcertainly the problem is that there are timing issues. As a trivial example, let's say you write to an SPI flash and then check its content. The flash memory write will take some times, so if you check immediately, the data would not be valid, but if you insert a printf call in between, it may have taken enough time that the read back is now valid.
I'm trying a driver for a custom hardware component, the source code can be found here:
https://github.com/godspeed1989/zedboard/blob/master/led_drv/driver/myled.c
the problem is that when i do:
insmod myled.ko
nothing is shown in the console or in the dmesg log. I'm reading Linux Device Driver 3 and on it is written that you always must do:
module_init(init_function);
module_exit(exit_function);
in the source code there are none of them, instead there is:
module_platform_driver(myled_driver);
But when i load a module with this function nothing is print, instead if I use module_init and module_exit messages appear, what are the difference between this two kind of istructions?
"but why the latter statement doesn't work while the first it's ok? "
The first methond will register the driver to system and bus by module_platform_driver macro.
The latter statement will not register your driver to system and bus. To to this, you need register driver in the init_function() routine by calling platform_device_register().
I am trying to implement my own new schedule(). I want to debug my code.
Can I use printk function in sched.c?
I used printk but it doesn't work. What did I miss?
Do you know how often schedule() is called? It's probably called faster than your computer can flush the print buffer to the log. I would suggest using another method of debugging. For instance running your kernel in QEMU and using remote GDB by loading the kernel.syms file as a symbol table and setting a breakpoint. Other virtualization software offers similar features. Or do it the manual way and walk through your code. Using printk in interrupt handlers is typically a bad idea (unless you're about to panic or stall).
If the error you are seeing doesn't happen often think of using BUG() or BUG_ON(cond) instead. These do conditional error messages and shouldn't happen as often as a non-conditional printk
Editing the schedule() function itself is typically a bad idea (unless you want to support multiple run queue's etc...). It's much better and easier to instead modify a scheduler class. Look at the code of the CFS scheduler to do this. If you want to accomplish something else I can give better advice.
It's not safe to call printk while holding the runqueue lock. A special function printk_sched was introduced in order to have a mechanism to use printk when holding the runqueue lock (https://lkml.org/lkml/2012/3/13/13). Unfortunatly it can just print one message within a tick (and there cannot be more than one tick when holding the run queue lock because interrupts are disabled). This is because an internal buffer is used to save the message.
You can either use lttng2 for logging to user space or patch the implementation of printk_sched to use a statically allocated pool of buffers that can be used within a tick.
Try trace_printk().
printk() has too much of an overhead and schedule() gets called again before previous printk() calls finish. This creates a live lock.
Here is a good article about it: https://lwn.net/Articles/365835/
It depends, basically it should be work fine.
try to use dmesg in shell to trace your printk if it is not there you apparently didn't invoked it.
2396 if (p->mm && printk_ratelimit()) {
2397 printk(KERN_INFO "process %d (%s) no longer affine to cpu%d\n",
2398 task_pid_nr(p), p->comm, cpu);
2399 }
2400
2401 return dest_cpu;
2402 }
there is a sections in sched.c that printk doesn't work e.g.
1660 static int double_lock_balance(struct rq *this_rq, struct rq *busiest)
1661 {
1662 if (unlikely(!irqs_disabled())) {
1663 /* printk() doesn't work good under rq->lock */
1664 raw_spin_unlock(&this_rq->lock);
1665 BUG_ON(1);
1666 }
1667
1668 return _double_lock_balance(this_rq, busiest);
1669 }
EDIT
you may try to printk once in 1000 times instead of each time.
I'm using 3.1 Sarge, kernel 2.4.26 on a TS-7400 board running ARM 9 architecture.
I am using the POSIX library terminos and fcntl.
I am writing a program to communicate between 2 embedded devices over serial. The program uses the POSIX time out flag VTIME and works successfully Ubuntu 10.1 but it does not time out on the board. I need the program to try resending a command if there is no response after a certain time. I know the board is transmitting OK the first time but then the program locks up waiting for a response. I am running the serial in delay mode so it will wait in read() until at least 1 byte is received or .1 secs have passed as defined by VTIME.
What is the problem or if VTIME simply does not work in this kernel what is another way to accomplish this?
Investigate the select() system call. This will let you execute a read when there is something to actually read, instead of waiting for .1 seconds, hoping something will show up. If this is supposed to be a straight port of your code then this may not be and appropriate thing to do.
It is an alternative....
I am writing a system monitor for Linux and want to include some watchdog functionality. In the kernel, you can configure the watchdog to keep going even if /dev/watchdog is closed. In other words, if my daemon exits normally and closes /dev/watchdog, the system would still re-boot 59 seconds later. That may or may not be desirable behavior for the user.
I need to make my daemon aware of this setting because it will influence how I handle SIGINT. If the setting is on, my daemon would need to (preferably) start an orderly shutdown on exit or (at least) warn the user that the system is going to reboot shortly.
Does anyone know of a method to obtain this setting from user space? I don't see anything in sysconf() to get the value. Likewise, I need to be able to tell if the software watchdog is enabled to begin with.
Edit:
Linux provides a very simple watchdog interface. A process can open /dev/watchdog , once the device is opened, the kernel will begin a 60 second count down to reboot unless some data is written to that file, in which case the clock re-sets.
Depending on how the kernel is configured, closing that file may or may not stop the countdown. From the documentation:
The watchdog can be stopped without
causing a reboot if the device
/dev/watchdog is closed correctly,
unless your kernel is compiled with
the CONFIG_WATCHDOG_NOWAYOUT option
enabled.
I need to be able to tell if CONFIG_WATCHDOG_NOWAYOUT was set from within a user space daemon, so that I can handle the shutdown of said daemon differently. In other words, if that setting is high, a simple:
# /etc/init.d/mydaemon stop
... would reboot the system in 59 seconds, because nothing is writing to /dev/watchdog any longer. So, if its set high, my handler for SIGINT needs to do additional things (i.e. warn the user at the least).
I can not find a way of obtaining this setting from user space :( Any help is appreciated.
AHA! After digging through the kernel's linux/watchdog.h and drivers/watchdog/softdog.c, I was able to determine the capabilities of the softdog ioctl() interface. Looking at the capabilities that it announces in struct watchdog_info:
static struct watchdog_info ident = {
.options = WDIOF_SETTIMEOUT |
WDIOF_KEEPALIVEPING |
WDIOF_MAGICCLOSE,
.firmware_version = 0,
.identity = "Software Watchdog",
};
It does support a magic close that (seems to) override CONFIG_WATCHDOG_NOWAYOUT. So, when terminating normally, I have to write a single char 'V' to /dev/watchdog then close it, and the timer will stop counting.
A simple ioctl() on a file descriptor to /dev/watchdog asking WDIOC_GETSUPPORT allows one to determine if this flag is set. Pseudo code:
int fd;
struct watchdog_info info;
fd = open("/dev/watchdog", O_WRONLY);
if (fd == -1) {
perror("open");
// abort, timer did not start - no additional concerns
}
if (ioctl(fd, WDIOC_GETSUPPORT, &info)) {
perror("ioctl");
// abort, but you probably started the timer! See below.
}
if (WDIOF_MAGICCLOSE & info.options) {
printf("Watchdog supports magic close char\n");
// You have started the timer here! Handle that appropriately.
}
When working with hardware watchdogs, you might want to open with O_NONBLOCK so ioctl() not open() blocks (hence detecting a busy card).
If WDIOF_MAGICCLOSE is not supported, one should just assume that the soft watchdog is configured with NOWAYOUT. Remember, just opening the device successfully starts the countdown. If all you're doing is probing to see if it supports magic close and it does, then magic close it. Otherwise, be sure to deal with the fact that you now have a running watchdog.
Unfortunately, there's no real way to know for sure without actually starting it, at least not that I could find.
a watchdog guards against hard-locking the system, either because of a software crash, or hardware failure.
what you need is a daemon monitoring daemon (dmd). check 'monit'
I think the watchdog device drivers are really intended for use on embedded platforms (or at least well controlled ones) where the developers will have control of which kernel is in use.
This could be considered to be an oversight, but I think it is not.
One other thing you could try, if the watchdog was built as a loadable module, unloading it will presumably abort the shutdown?