linux kernel module dies after 100000 interrupts - c

I'm working on a kernel module for the 2.6.39 kernel. (I know this is out of date, but it's what came with my evaluation board and I wanted to get this working before moving to the 3.x series.)
My module is very simple at the moment. It listens for a 200us pulse on a GPIO pin, then increments a counter which resets it every 25089 iterations. (25089 is the size of a buffer to be used later.) Strangely enough though, my module dies after exactly 100000 interrupts every time I use it, and I'm really at a loss. I looked at changing the kernel jiffy frequency, but that seems like it's unrelated. I also tried using a tickless kernel, and that doesn't seem to have an effect either. I can't find much of anything on google about this problem either. Has anyone else seen this issue?
I'm building for an Atmel AT91 processor if that is important. I'll list my crash message below.
root#at91:~# irq 56: nobody cared (try booting with the "irqpoll" option)
[<c0036804>] (unwind_backtrace+0x0/0xec) from [<c006eca4>] (__report_bad_irq+0x34/0xa0)
[<c006eca4>] (__report_bad_irq+0x34/0xa0) from [<c006eed0>] (note_interrupt+0x1c0/0x22c)
[<c006eed0>] (note_interrupt+0x1c0/0x22c) from [<c006d904>] (handle_irq_event_percpu+0x168/0x19c)
[<c006d904>] (handle_irq_event_percpu+0x168/0x19c) from [<c006d960>] (handle_irq_event+0x28/0x38)
[<c006d960>] (handle_irq_event+0x28/0x38) from [<c003ace0>] (gpio_irq_handler+0x74/0x98)
[<c003ace0>] (gpio_irq_handler+0x74/0x98) from [<c002b078>] (asm_do_IRQ+0x78/0xac)
[<c002b078>] (asm_do_IRQ+0x78/0xac) from [<c00313d4>] (__irq_svc+0x34/0x60)
Exception stack(0xc04b1f70 to 0xc04b1fb8)
1f60: 00000000 0005317f 0005217f 60000013
1f80: c04b0000 c04b61cc c04b5ffc c04e1224 20000000 41069265 20025cbc 00000000
1fa0: 600000d3 c04b1fb8 c0032cc8 c0032cd4 60000013 ffffffff
[<c00313d4>] (__irq_svc+0x34/0x60) from [<c0032cd4>] (default_idle+0x38/0x40)
[<c0032cd4>] (default_idle+0x38/0x40) from [<c0032af8>] (cpu_idle+0x70/0xc8)
[<c0032af8>] (cpu_idle+0x70/0xc8) from [<c00089c0>] (start_kernel+0x284/0x2e4)
[<c00089c0>] (start_kernel+0x284/0x2e4) from [<20008038>] (0x20008038)
handlers:
[<c01eb660>] (grab_spi_data+0x0/0x6c)
Disabling IRQ #56
root#at91:~# cat /proc/interrupts
CPU0
1: 1387 AIC at91_tick, at91_rtc, ttyS0
12: 39 AIC atmel_mci.0
13: 0 AIC atmel_spi.0
14: 0 AIC atmel_spi.1
17: 1804 AIC tc_clkevt
20: 7520 AIC at_hdmac
21: 0 AIC at_hdmac
22: 1 AIC ehci_hcd:usb1, ohci_hcd:usb2
23: 0 AIC atmel_usba_udc
24: 136 AIC eth0
26: 0 AIC atmel_mci.1
56: 100000 GPIO quicklogic_ready
80: 0 GPIO atmel_usba_udc
142: 0 GPIO mmc-detect
143: 1 GPIO mmc-detect
Err: 0
My interrupt handler is called grab_spi_data, which you can see is that the bottom of the backtrace, and I'm watching IRQ 56. I am really stumped.

Looks like you're not handling the IRQs.
Linux gets upset after 100,000 times - See the comment above __report_bad_irq and find this magic number.
Your interrupt handler probably never returns IRQ_HANDLED, which is what it should do after handling the interrupt.

Related

How to get number of NIC RX rings in Linux

I've try to get count of NIC RX rings from my programs via ETHTOOL API and
command ETHTOOL_GCHANNELS, but program returns error: "Operation not supported."
Sample code:
echannels.cmd = ETHTOOL_GCHANNELS;
req.ifr_data = (void*)&echannels;
if (ioctl(sock, SIOCETHTOOL, &req) != 0)
ERR("Can't get %s channels info! %s", nic, strerror(errno));
else
rx_no = echannels.rx_count;
Also i've try to get it from ethtool "ethtool -l eth0" with the same result:
#ethtool -l eth0
Channel parameters for eth0:
Cannot get device channel parameters
: Operation not supported
, but in /proc/interrupts i see that NIC have multiple RX rings binded to different CPU cores.
Anybody can tell me right way to get count of RX rings from C code?
I was looking to get the ring counts of my NICs as well. After talking with someone more knowledgeable than I (who also wasn't figuring it out), we came to the agreement that you might have to check the specs for your NIC; that it doesn't seem to be something directly available through ethtool.
However, this post also shows that, by checking the indirection table, it can show the RX ring (but not TX ring) count:
$ sudo ethtool -x eth0
RX flow hash indirection table for eth3 with 2 RX ring(s):
0: 0 1 0 1 0 1 0 1
8: 0 1 0 1 0 1 0 1
16: 0 1 0 1 0 1 0 1
24: 0 1 0 1 0 1 0 1
However, on one of my machines I see:
$ sudo ethtool -x enp9s0
Cannot get RX ring count: Operation not supported
This is in spite of the documentation for the NIC saying both:
"The 82574L supports two transmit descriptor rings"
"Figure 26 shows the structure of the two receive descriptor rings."
Hopefully either the -x option or checking the documentation for your NIC will help. It doesn't seem to be consistently and directly accessible via ethtool.

MSP430 microcontroller - how to check addressing modes

I'm programming a MSP430 in C language as a simulation of real microcontroller. I got stuck in addressing modes (https://en.wikipedia.org/wiki/TI_MSP430#MSP430_CPU), especially:
Addressing modes using R0 (PC)
Addressing modes using R2 (SR) and R3 (CG), special-case decoding
I don't understand what does mean 0(PC), 2(SR) and 3(CG). What they are?
How to check these values?
so for the source if the as bits are 01 and the source register bits are a 0 which is the pc for reference then
ADDR Symbolic. Equivalent to x(PC). The operand is in memory at address PC+x.
if the ad bit is a 1 and the destination is a 0 then also
ADDR Symbolic. Equivalent to x(PC). The operand is in memory at address PC+x.
x is going to be another word that follows this instruction so the cpu will fetch the next word, add it to the pc and that is the source
if the as bits are 11 and the source is register 0, the source is an immediate value which is in the next word after the instruction.
if the as bits are 01 and the source is a 2 which happens to be the SR register for reference then the address is x the next word after the instruction (&ADDR)
if the ad bit is a 1 and the destination register is a 2 then it is also an &ADDR
if the as bits are 10 the source bits are a 2, then the source is the constant value 4 and we dont have to burn a word in flash after the instruction for that 4.
it doesnt make sense to have a destination be a constant 4 so that isnt a real combination.
repeat for the rest of the table.
you can have both of these addressing modes at the same time
mov #0x5A80,&0x0120
generates
c000: b2 40 80 5a mov #23168, &0x0120 ;#0x5a80
c004: 20 01
which is
0x40b2 0x5a80 0x0120
0100000010110010
0100 opcode mov
0000 source
1 ad
0 b/w
11 as
0010 destination
so we have an as of 11 with source of 0 the immediate #x, an ad of 1 with a destination 2 so the destination is &ADDR. this is an important experiment because when you have 2 x values, a three word instruction basically which one goes with the source and which the destination
0x40b2 0x5a80 0x0120
so the address 0x5a80 which is the destination is the first x to follow the instruction then the source 0x0120 an immediate comes after that.
if it were just an immediate and a register then
c006: 31 40 ff 03 mov #1023, r1 ;#0x03ff
0x4031 0x03FF
0100000000110001
0100 mov
0000 source
0 ad
0 b/w
11 as
0001 dest
as of 11 and source of 0 is #immediate the X is 0x03FF in this case the word that follows. the destination is ad of 0
Register direct. The operand is the contents of Rn
where destination in this case is r1
so the first group Rn, x(Rn), #Rn and #Rn+ are the normal cases, the ones below that that you are asking about are special cases, if you get a combination that fits into a special case then you do that otherwise you do the normal case like the mov immediate to r1 example above. the destination of r1 was a normal Rn case.
As=01, Ad=1, R0 (ADDR): This is exactly the same as x(Rn), i.e., the operand is in memory at address R0+x.
This is used for data that is stored near the code that uses it, when the compiler does not know at which absolute address the code will be located, but it knows that the data is, e.g., twenty words behind the instruction.
As=11, R0 (#x): This is exactly the same as #R0+, and is used for instructions that need a word of data from the instruction stream. For example, this assembler instruction:
MOV #1234, R5
is actually encoded and implemented as:
MOV #PC+, R5
.dw 1234
After the CPU has read the MOV instruction word, PC points to the data word. When reading the first MOV operand, the CPU reads the data word, and increments PC again.
As=01, Ad=1, R2 (&ADDR): this is exactly the same as x(Rn), but the R2 register reads as zero, so what you end up with is the value of x.
Using the always-zero register allows to encode absolute addresses without needing a special addressing mode for this (just a special register).
constants -1/0/1/2/4/8: it would not make sense to use the SR and CG registers with most addressing modes, so these encodings are used to generate special values without a separate data word, to save space:
encoding: what actually happens:
MOV #SR, R5 MOV #4, R5
MOV #SR+, R5 MOV #8, R5
MOV CG, R5 MOV #0, R5
MOV x(CG), R5 MOV #1, R5 (no word for x)
MOV #CG, R5 MOV #2, R5
MOV #CG+, R5 MOV #-1, R5

Automatically attaching to process on SEGV and other fatal signals (panic_action)

Background
Code to support a 'panic_action' was recently added to the FreeRADIUS v3.0.x, v2.0.x and master branches.
When radiusd (main FreeRADIUS process) receives a fatal signal (SIGFPE, SIGABRT, SIGSEGV etc...), the signal handler executes a predefined 'panic_action' which is a snippet of shell code passed to system(). The signal handler performs basic substitution for %e and %p writing in the values of the current binary name, and the current PID.
This should in theory allow a debugger like gdb or lldb to attach to the process (panic_action = lldb -f %e -p %p), either to perform interactive debugging, or to automate collection of a backtrace. This actually works well on my system OSX 10.9.2 with lldb, but only for SIGABRT.
Problem
This doesn't seem to work for other signals like SIGSEGV. The mini backtrace from execinfo is valid, but when lldb or gdb attach to the process, they only get the backtrace from for the signal handler.
There doesn't seem to be a way in lldb to switch to an arbitrary frame address.
Does anyone know if there's any way of forcing the signal handler to execute in the same stack as the the thread that received the signal? Or why when lldb attaches the backtraces don't show the full stack.
The actual output looks like:
FATAL SIGNAL: Segmentation fault: 11
Backtrace of last 12 frames:
0 libfreeradius-radius.dylib 0x000000010cf1f00f fr_fault + 127
1 libsystem_platform.dylib 0x00007fff8b03e5aa _sigtramp + 26
2 radiusd 0x000000010ce7617f do_compile_modsingle + 3103
3 libfreeradius-server.dylib 0x000000010cef3780 fr_condition_walk + 48
4 radiusd 0x000000010ce7710f modcall_pass2 + 191
5 radiusd 0x000000010ce7713f modcall_pass2 + 239
6 radiusd 0x000000010ce7078d virtual_servers_load + 685
7 radiusd 0x000000010ce71df1 setup_modules + 1633
8 radiusd 0x000000010ce6daae read_mainconfig + 2526
9 radiusd 0x000000010ce78fe6 main + 1798
10 libdyld.dylib 0x00007fff8580a5fd start + 1
11 ??? 0x0000000000000002 0x0 + 2
Calling: lldb -f /usr/local/freeradius/sbin/radiusd -p 1397
Current executable set to '/usr/local/freeradius/sbin/radiusd' (x86_64).
Attaching to process with:
process attach -p 1397
Process 1397 stopped
(lldb) bt
error: libfreeradius-radius.dylib debug map object file '/Users/arr2036/Documents/Repositories/freeradius-server-fork/build/objs//Users/arr2036/Documents/Repositories/freeradius-server-master/src/lib/debug.o' has changed (actual time is 0x530f3d21, debug map time is 0x530f37a5) since this executable was linked, file will be ignored
* thread #1: tid = 0x8d824, 0x00007fff867fee38 libsystem_kernel.dylib`wait4 + 8, queue = 'com.apple.main-thread, stop reason = signal SIGSTOP
frame #0: 0x00007fff867fee38 libsystem_kernel.dylib`wait4 + 8
frame #1: 0x00007fff82869090 libsystem_c.dylib`system + 425
frame #2: 0x000000010cf1f2e1 libfreeradius-radius.dylib`fr_fault + 849
frame #3: 0x00007fff8b03e5aa libsystem_platform.dylib`_sigtramp + 26
(lldb)
Code
The relevant code for fr_fault() is here:https://github.com/FreeRADIUS/freeradius-server/blob/b7ec8c37c7204accbce4be4de5013397ab662ea3/src/lib/debug.c#L227
and fr_set_signal() the function used to setup signal handlers is here: https://github.com/FreeRADIUS/freeradius-server/blob/0cf0e88704228e8eac2948086e2ba2f4d17a5171/src/lib/misc.c#L61
As the links contain commit hashes the code should be static
EDIT
Finally with version lldb-330.0.48 on OSX 10.10.4 lldb can now go past _sigtram.
frame #2: 0x000000010b96c5f7 libfreeradius-radius.dylib`fr_fault(sig=11) + 983 at debug.c:735
732 FR_FAULT_LOG("Temporarily setting PR_DUMPABLE to 1");
733 }
734
-> 735 code = system(cmd);
736
737 /*
738 * We only want to error out here, if dumpable was originally disabled
(lldb)
frame #3: 0x00007fff8df77f1a libsystem_platform.dylib`_sigtramp + 26
libsystem_platform.dylib`_sigtramp:
0x7fff8df77f1a <+26>: decl -0x16f33a50(%rip)
0x7fff8df77f20 <+32>: movq %rbx, %rdi
0x7fff8df77f23 <+35>: movl $0x1e, %esi
0x7fff8df77f28 <+40>: callq 0x7fff8df794d8 ; symbol stub for: __sigreturn
(lldb)
frame #4: 0x000000010bccb027 rlm_json.dylib`_json_map_proc_get_value(ctx=0x00007ffefa62dbe0, out=0x00007fff543534b8, request=0x00007ffefa62da30, map=0x00007ffefa62aaf0, uctx=0x00007fff54353688) + 391 at rlm_json.c:191
188 }
189 vp = map->op;
190
-> 191 if (value_data_steal(vp, &vp->data, vp->da->type, value) < 0) {
192 REDEBUG("Copying data to attribute failed: %s", fr_strerror());
193 talloc_free(vp);
194 goto error;
This is a bug in lldb related to backtracing through _sigtramp, the asynchronous signal handler in user processes. Unfortunately I can't suggest a workaround for this problem. It has been fixed in the top of tree sources for lldb at http://lldb.llvm.org/ if you're willing to build from source (see the "Source" and "Build" sidebars). But Xcode 5.0 and the next dot release are going to have real problems backtracing past _sigtramp.

unwind_frame cause a kernel paging error

Background
Found a strange kernel Oops, Googled a lot, found nothing.
Background:
The kernel version is 3.0.8
There are two process let's say p1, p2
p2 have lots of threads(about 30)
p1 continuously calls system(pidof("name of p1"))
The kernel may Oops after running for a few days. the primary reason I found is that unwind_frame got a strange frame->fp(0xFFFFFFFF) from get_wchan
When executing this line
frame->fp = *(unsigned long *)(fp - 12);
The CPU will try to access 0xFFFFFFF3, and cause a paging error.
My question is:
How on earth the fp register saved before context switch becomes 0xFFFFFFFF ?
here is the CPU infomation
# cat /proc/cpuinfo
Processor : ARMv7 Processor rev 0 (v7l)
processor : 0
BogoMIPS : 1849.75
processor : 1
BogoMIPS : 1856.30
Features : swp half thumb fastmult vfp edsp vfpv3 vfpv3d16
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
Here is the Oops and pt registers:
[734212.113136] Unable to handle kernel paging request at virtual address fffffff3
[734212.113154] pgd = 826f0000
[734212.113175] [fffffff3] *pgd=8cdfe821, *pte=00000000, *ppte=00000000
[734212.113199] Internal error: Oops: 17 [#1] SMP
--------------cut--------------
[734212.113464] CPU: 1 Tainted: P (3.0.8 #2)
[734212.113523] PC is at unwind_frame+0x48/0x68
[734212.113538] LR is at get_wchan+0x8c/0x298
[734212.113557] pc : [<8003d120>] lr : [<8003a660>] psr: a0000013
[734212.113561] sp : 845d1cc8 ip : 00000003 fp : 845d1cd4
[734212.113583] r10: 00000001 r9 : 00000000 r8 : 80493c34
[734212.113597] r7 : 00000000 r6 : 00000000 r5 : 83354960 r4 : 845d1cd8
[734212.113613] r3 : 845d1cd8 r2 : ffffffff r1 : 80490000 r0 : 8049003f
[734212.113632] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[734212.113651] Control: 10c53c7d Table: 826f004a DAC: 00000015
Here is the callstack:
[734212.117027] Backtrace:
[734212.117052] [<8003d0d8>] (unwind_frame+0x0/0x68) from [<8003a660>] (get_wchan+0x8c/0x298)
[734212.117079] [<8003a5d4>] (get_wchan+0x0/0x298) from [<8011f700>] (do_task_stat+0x548/0x5ec)
[734212.117099] r4:00000000
[734212.117118] [<8011f1b8>] (do_task_stat+0x0/0x5ec) from [<8011f7c0>] (proc_tgid_stat+0x1c/0x24)
[734212.117158] [<8011f7a4>] (proc_tgid_stat+0x0/0x24) from [<8011b7f0>] (proc_single_show+0x54/0x98)
[734212.117196] [<8011b79c>] (proc_single_show+0x0/0x98) from [<800e9024>] (seq_read+0x1b4/0x4e4)
[734212.117215] r8:845d1f08 r7:845d1f70 r6:00000001 r5:8ca89d20 r4:866ea540
[734212.117237] r3:00000000
[734212.117264] [<800e8e70>] (seq_read+0x0/0x4e4) from [<800c8c54>] (vfs_read+0xb4/0x19c)
[734212.117289] [<800c8ba0>] (vfs_read+0x0/0x19c) from [<800c8e18>] (sys_read+0x44/0x74)
[734212.117307] r8:00000000 r7:00000003 r6:000003ff r5:7ea00818 r4:8ca89d20
[734212.117340] [<800c8dd4>] (sys_read+0x0/0x74) from [<800393c0>] (ret_fast_syscall+0x0/0x30)
[734212.117358] r9:845d0000 r8:80039568 r6:7ea00c90 r5:0000000e r4:7ea00818
[734212.117388] Code: e3c10d7f e3c0103f e151000c 9afffff6 (e512100c)
[734212.113136] Unable to handle kernel paging request at virtual address fffffff3
[734212.113154] pgd = 826f0000
[734212.113175] [fffffff3] *pgd=8cdfe821, *pte=00000000, *ppte=00000000
[734212.113199] Internal error: Oops: 17 [#1] SMP
This bug was fixed by Konstantin Khlebnikov, details can be found in git commit log.

GCC/objdump: Generating compilable/buildable assembly (interspersed with C/C++) source?

This is close to Using GCC to produce readable assembly?, but my context here is avr-gcc (and correspondingly, avr-objdump) for Atmel (though, I guess it would apply across the GCC board).
The thing is, I have a project of multiple .c and .cpp files; which ultimately get compiled into an executable, with the same name as the 'master' .cpp file. In this process, I can obtain assembly listing in two ways:
I can instruct gcc to emit assembly listing source (see Linux Assembly and Disassembly an Introduction) using the -S switch; in this case, I get a file, with contents like:
...
loop:
push r14
push r15
push r16
push r17
push r28
push r29
/* prologue: function /
/ frame size = 0 */
ldi r24,lo8(13)
ldi r22,lo8(1)
call digitalWrite
rjmp .L2
.L3:
ldi r24,lo8(MyObj)
ldi r25,hi8(MyObj)
call _ZN16MYOBJ7connectEv
.L2:
ldi r24,lo8(MyObj)
ldi r25,hi8(MyObj)
call _ZN16MYOBJ11isConnectedEv
...
(Haven't tried it yet; but I guess this code is compilable/buildable....)
I can inspect the final executable with, and instruct, objdump to emit assembly listing source using the -S switch; in this case, I get a file, with contents like:
...
0000066a <init>:
void init()
{
// this needs to be called before setup() or some functions won't
// work there
sei();
66a: 78 94 sei
66c: 83 b7 in r24, 0x33 ; 51
66e: 84 60 ori r24, 0x04 ; 4
670: 83 bf out 0x33, r24 ; 51
...
000006be <loop>:
6be: ef 92 push r14
6c0: ff 92 push r15
6c2: 0f 93 push r16
6c4: 1f 93 push r17
6c6: cf 93 push r28
6c8: df 93 push r29
6ca: 8d e0 ldi r24, 0x0D ; 13
6cc: 61 e0 ldi r22, 0x01 ; 1
6ce: 0e 94 23 02 call 0x446 ; 0x446
6d2: 04 c0 rjmp .+8 ; 0x6dc
6d4: 8d ef ldi r24, 0xFD ; 253
6d6: 94 e0 ldi r25, 0x04 ; 4
6d8: 0e 94 25 06 call 0xc4a ; 0xc4a <_ZN16MYOBJ7connectEv>
6dc: 8d ef ldi r24, 0xFD ; 253
6de: 94 e0 ldi r25, 0x04 ; 4
6e0: 0e 94 21 06 call 0xc42 ; 0xc42 <_ZN16MYOBJ11isConnectedEv>
...
(I did try to build this code, and it did fail - it reads the 'line numbers' as labels)
Obviously, both listings (for the loop function, at least) represent the same assembly code; except:
The gcc one (should) compile -- the objdump one does not
The objdump one contains listings of all referred functions, which could be defined in files other than the 'master' (e.g., digitalWrite) -- the gcc one does not
The objdump one contains original C/C++ source lines 'interspersed' with assembly (but only occasionally, and seemingly only for C files?) -- the gcc one does not
So, is there a way to obtain an assembly listing that would be 'compilable', however with all in-linked functions, and where source C/C++ code is (possibly, where appropriate) interspersed as comments (so they don't interfere with compilation of assembly file)? (short of writing a parser for the output of objdump, that is :))
Add the option -fverbose-asm to your gcc command line up there. (This is in the gcc manual, but it's documented under 'Code Gen Options')
The "dependencies" that you talk about often come from libraries or separate object files, so they don't have a source - they're just linked as binary code into the final executable. Since such code is not passed through the assembler, you will have to extract it using other ways - e.g. using objdump.
Maybe you should tell us what you really want to achieve because I don't see much point in such an exercise by itself.
The best I have been able to get is to use -Wa,-ahl=outfile.s instead of -S. It isn't compilable code though, but a listing file for diagnostic purposes; the compiled object file is emitted as usual.

Resources