When trying to log/debug an ISR, I've seen:
1) sprintf() used as example in 'O'Reilly Linux Device Drivers'
irqreturn_t short_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
struct timeval tv;
int written;
do_gettimeofday(&tv);
/* Write a 16 byte record. Assume PAGE_SIZE is a multiple of 16 */
written = sprintf((char *)short_head,"%08u.%06u\n",
(int)(tv.tv_sec % 100000000), (int)(tv.tv_usec));
BUG_ON(written != 16);
short_incr_bp(&short_head, written);
wake_up_interruptible(&short_queue); /* awake any reading process */
return IRQ_HANDLED;
}
unlike printf(), sprintf() write to memory instead of to console, and does not seem to have re-entrant or blocking issue, correct? but I've seen words against sprintf() on other forum. I am not sure if it's only because of its performance overhead, or else?
2) printk() is another one i've seen people used but blamed for, again performance issue (maybe nothing else?)
What is a generally good method/function to use when logging, or debugging ISR in Linux these days?
Regarding sprintf(). Do the search in any LXR site, for example here:
Freetext search: sprintf (4096 estimated hits)
drivers/video/mbx/mbxdebugfs.c, line 100 (100%)
drivers/isdn/hisax/q931.c, line 1207 (100%)
drivers/scsi/aic7xxx_old/aic7xxx_proc.c, line 141
I think this eliminates any doubts.
As for printk(), printk.h says:
/* If you are writing a driver, please use dev_dbg instead */
Related
I inherited some ALSA code that runs on a Linux embedded platform.
The existing implementation does blocking reads and writes using snd_pcm_readi() and snd_pcm_writei().
I am tasked to make this run on an ARM processor, but I find that the blocked interleaved reads push the CPU to 99%, so I am exploring non-blocking reads and writes.
I open the device as can be expected:
snd_pcm_handle *handle;
const char* hwname = "plughw:0"; // example name
snd_pcm_open(&handle, hwname, SND_PCM_STREAM_CAPTURE, SND_PCM_NONBLOCK);
Other ALSA stuff then happens which I can supply on request.
Noteworthy to mention at this point that:
we set a sampling rate of 48,000 [Hz]
the sample type is signed 32 bit integer
the device always overrides our requested period size to 1024 frames
Reading the stream like so:
int32* buffer; // buffer set up to hold #period_size samples
int actual = snd_pcm_readi(handle, buffer, period_size);
This call takes approx 15 [ms] to complete in blocking mode. Obviously, variable actual will read 1024 on return.
The problem is; in non-blocking mode, this function also takes 15 msec to complete and actual also always reads 1024 on return.
I would expect that the function would return immediately, with actual being <=1024 and quite possibly reading "EAGAIN" (-11).
In between read attempts I plan to put the thread to sleep for a specific amount of time, yielding CPU time to other processes.
Am I misunderstanding the ALSA API? Or could it be that my code is missing a vital step?
If the function returns a value of 1024, then at least 1024 frames were available at the time of the call.
(It's possible that the 15 ms is time needed by the driver to actually start the device.)
Anyway, blocking or non-blocking mode does not make any difference regarding CPU usage. To reduce CPU usage, replace the default device with plughw or hw, but then you lose features like device sharing or sample rate/format conversion.
I solved my problem by wrapping snd_pcm_readi() as follows:
/*
** Read interleaved stream in non-blocking mode
*/
template <typename SampleType>
snd_pcm_sframes_t snd_pcm_readi_nb(snd_pcm_t* pcm, SampleType* buffer, snd_pcm_uframes_t size, unsigned samplerate)
{
const snd_pcm_sframes_t avail = ::snd_pcm_avail(pcm);
if (avail < 0) {
return avail;
}
if (avail < size) {
snd_pcm_uframes_t remain = size - avail;
unsigned long msec = (remain * 1000) / samplerate;
static const unsigned long SLEEP_THRESHOLD_MS = 1;
if (msec > SLEEP_THRESHOLD_MS) {
msec -= SLEEP_THRESHOLD_MS;
// exercise for the reader: sleep for msec
}
}
return ::snd_pcm_readi(pcm, buffer, size);
}
This works quite well for me. My audio process now 'only' takes 19% CPU time.
And it matters not if the PCM interface was opened using SND_PCM_NONBLOCK or 0.
Going to perform callgrind analysis to see if more CPU cycles can be saved elsewhere in the code.
I am working on a devices driver for a data acquisition system. There is a pci device that provides input and output data at the same time at regular intervals. And then the linux mod manages the data in circular buffers that are read and written to through file operations.
The data throughput of the system is relatively low it receives just over 750,000 bytes/second and transmits just over 150,000 bytes per second.
There is a small user space utility that writes and reads data in a loop for testing purposes.
Here is a section of the driver code (All the code related to the circular buffers has been omitted for simplicity sake. PCI device initialization is taken care of elsewhere and pci_interupt not the real entry point for the interrupt handler)
#include <linux/sched.h>
#include <linux/wait.h>
static DECLARE_WAIT_QUEUE_HEAD(wq_head);
static ssize_t read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos)
{
DECLARE_WAITQUEUE(wq, current);
if(count == 0)
return 0;
add_wait_queue(&wq_head, &wq);
do
{
set_current_state(TASK_INTERRUPTIBLE);
if(/*There is any data in the receive buffer*/)
{
/*Copy Data from the receive buffer into user space*/
break;
}
schedule();
} while(1);
set_current_state(TASK_RUNNING);
remove_wait_queue(&wq_head, &wq);
return count;
}
static ssize_t write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) {
/* Copy data from userspace into the transmit buffer*/
}
/* This procedure get's called in real time roughly once every 5 milliseconds,
It writes 4k to the receiving buffer and reads 1k from the transmit buffer*/
static void pci_interrupt() {
/*Copy data from PCI dma buffer to receiving buffer*/
if(/*There is enough data in the transmit buffer to fill the PCI dma buffer*/) {
/*Copy from the transmit buffer to the PCI device*/
} else {
/*Copy zero's to the PCI device*/
printk(KERN_ALERT DEVICE_NAME ": Data Underflow. Writing 0's'");
}
wake_up_interruptible(&wq_head);
}
The above code works well for long periods of time however every 12-18 hours there is a data underflow error. Resulting in zeros being written.
My first thought is that due to the userspace application not being truly real-time the time delay between it's read and write operations occasionally got too large causing the failure. However I tried changing the size of the reads and writes in userspace and changing the niceness of the userspace application this had no effect on the frequency of the error.
Do to the error's nature I believe there is some form of race condition in the three methods above. I am not sure how linux kernel wait queues work.
Is there a decent alternative to the above method for blocking reads or is there something else that is wrong the could cause this behavior.
System Information:
Linux Version: Ubuntu 16.10
Linux Kernel: linux-4.8.0-lowlatency
Chipset: Intel Celeron N3150/N3160 Quad Core 2.08 GHz SoC
TL;DR: The above code hits underflow errors every 12-18 hours is there a better way to do blocking IO or some race condition in the code.
One standard way used in linux can also be used in your case.
User space test program:
1. open file in blocking mode (default in linux until you specify NONBLOCK flag)
2. call select() to block on file descriptor.
Kernel driver:
1. Register interrupt handler which gets invoked whenever there is data available
2. Handler take lock to protect common buffer between reads/writes and transfer of data
Take a look at these links for source code from ldd3 book test and driver.
Can any one help me to understand difference between below mentioned APIs in Linux kernel:
struct workqueue_struct *create_workqueue(const char *name);
struct workqueue_struct *create_singlethread_workqueue(const char *name);
I had written sample modules, when I try to see them using ps -aef, both have created a workqueue, but I was not able to see any difference.
I have referred to http://www.makelinux.net/ldd3/chp-7-sect-6, and according to LDD3:
If you use create_workqueue, you get a workqueue that has a dedicated thread for each processor on the system. In many cases, all those threads are simply overkill; if a single worker thread will suffice, create the workqueue with create_singlethread_workqueue instead.
But I was not able to see multiple worker threads (each for a processor).
Workqueues have changed since LDD3 was written.
These two functions are actually macros:
#define create_workqueue(name) \
alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name))
#define create_singlethread_workqueue(name) \
alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1, (name))
The alloc_workqueue documentation says:
Allocate a workqueue with the specified parameters. For detailed
information on WQ_* flags, please refer to Documentation/workqueue.txt.
That file is too big to quote entirely, but it says:
alloc_workqueue() allocates a wq. The original create_*workqueue()
functions are deprecated and scheduled for removal.
[...]
A wq no longer manages execution resources but serves as a domain for
forward progress guarantee, flush and work item attributes.
if(singlethread){
cwq = init_cpu_workqueue(wq, singlethread_cpu);
err = create_workqueue_thread(cwq, singlethread_cpu);
start_workqueue_thread(cwq, -1);
}else{
list_add(&wq->list, &workqueues);
for_each_possible_cpu(cpu) {
cwq = init_cpu_workqueue(wq, cpu);
err = create_workqueue_thread(cwq, cpu);
start_workqueue_thread(cwq, cpu);
}
}
I have added support for AIO in my driver (the .aio_read , .aio_write calls in kernelland, libaio in userland) and looking at various sources I cannot find if in my aio_read, .aio_write calls I can just store a pointer to the iovector argument (in the assumption that this memory will remain untouched till after eg aio_complete is called), or that I need to deep copy over the iovector data structures.
static ssize_t aio_read( struct kiocb *iocb, const struct iovec *iovec, unsigned long nr_segs, loff_t pos );
static ssize_t aio_write( struct kiocb *iocb, const struct iovec *iovec, unsigned long nr_segs, loff_t pos );
Looking at the implementation of \drivers\usb\gadget\inode.c as an example, it seems they just copy the pointer in the ep_aio_rwtail function which has:
priv->iv = iv;
But when I try doing something similar it very regularly happens the data in the iovector has been "corrupted" by the time I process it.
Eg in the aio_read/write calls I log
iovector located at addr:0xbf1ebf04
segment 0: base: 0x76dbb468 len:512
But then when I do the real work in a kernel thread (after attaching to the user space mm) I logged the following:
iovector located at addr:0xbf1ebf04
segment 0: base: 0x804e00c8 len:-1088503900
This is with a very simple test case where I only submit 1 asynchronous command in my user application.
To make things more interesting:
I have the corruption about 80% of the time on a 3.13 kernel.
But I never saw it before on a 3.9 kernel (but I only used it for a short while before I upgraded to 3.13, and now reverted back as a sanity cnheck and tried a dozen times or so).
( An example run with a 3.9 kernel has twice
iovector located at addr:0xbf9ee054
segment 0: base: 0x76e28468 len:512)
Does this ring any bells ?
(The other possibility is that I am corrupting these addresses/lengths myself of course, but it is strange that I never had this with a 3.9)
EDIT:
To answer my own question after reviewing the 3.13 code for linux aio (which has changed significantly wrt the 3.9 that was working), in fs\aio.c you have:
static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode,
char __user *buf, bool compat)
{
...
struct iovec inline_vec, *iovec = &inline_vec;
...
ret = rw_op(req, iovec, nr_segs, req->ki_pos);
...
}
So this iovec structure is just on stack, and it will be lost as soon as the aio_read/write function exits.
And the gadget framework contains a bug (at least for 3.13) in \drivers\usb\gadget\inode.c...
From the man page for aio_read;
NOTES
It is a good idea to zero out the control block before use. The control block must not be changed while the read operation is in progress.
The buffer area being read into must not be accessed during the operation or undefined results may occur. The memory areas involved must
remain valid.
Simultaneous I/O operations specifying the same aiocb structure produce
undefined results.
This suggests the driver can rely on the user's data structures during the operation. It would be prudent to abandon the operation and return an asynchronous error if, during the operation, you detect those structures have changed.
as stated in: http://www.kernel.org/doc/htmldocs/kernel-hacking.html#routines-copy this functions "can" sleep.
So, do I always have to do a lock (e.g. with mutexes) when using this functions or are there exceptions?
I'm currently working on a module and saw some Kernel Oops at my system, but cannot reproduce them. I have a feeling they are fired because I'm currently do no locking around copy_[to/from]_user(). Maybe I'm wrong, but it smells like it has something to do with it.
I have something like:
static unsigned char user_buffer[BUFFER_SIZE];
static ssize_t mcom_write (struct file *file, const char *buf, size_t length, loff_t *offset) {
ssize_t retval;
size_t writeCount = (length < BUFFER_SIZE) ? length : BUFFER_SIZE;
memset((void*)&user_buffer, 0x00, sizeof user_buffer);
if (copy_from_user((void*)&user_buffer, buf, writeCount)) {
retval = -EFAULT;
return retval;
}
*offset += writeCount;
retval = writeCount;
cleanupNewline(user_buffer);
dispatch(user_buffer);
return retval;
}
Is this save to do so or do I need locking it from other accesses, while copy_from_user is running?
It's a char device I read and write from, and if a special packet in the network is received, there can be concurrent access to this buffer.
You need to do locking iff the kernel side data structure that you are copying to or from might go away otherwise - but it is that data structure you should be taking a lock on.
I am guessing your function mcom_write is a procfs write function (or similar) right? In that case, you most likely are writing to the procfs file, your program being blocked until mcom_write returns, so even if copy_[to/from]_user sleeps, your program wouldn't change the buffer.
You haven't stated how your program works so it is hard to say anything. If your program is multithreaded and one thread writes while another can change its data, then yes, you need locking, but between the threads of the user-space program not your kernel module.
If you have one thread writing, then your write to the procfs file would be blocked until mcom_write finishes so no locking is needed and your problem is somewhere else (unless there is something else that is wrong with this function, but it's not with copy_from_user)