Consider this totally stupid code:
int main() { __asm__("int $0x2"); }
This causes a segfault when run. 2 is code for NMI in intel's IDT (Section 6.3.1 here).
I am curious on why this segfaults though? What exactly is the control flow which will eventually cause it to segfault?
Also pasting section 6.3.3 of the manual here:
6.3.3 Software-Generated Interrupts
The INT n instruction permits interrupts to be generated from within software by supplying an interrupt vector
number as an operand. For example, the INT 35 instruction forces an implicit call to the interrupt handler for interrupt 35.
Any of the interrupt vectors from 0 to 255 can be used as a parameter in this instruction. If the processor’s
predefined NMI vector is used, however, the response of the processor will not be the same as it would be from an
NMI interrupt generated in the normal manner. If vector number 2 (the NMI vector) is used in this instruction, the
NMI interrupt handler is called, but the processor’s NMI-handling hardware is not activated.
Interrupts generated in software with the INT n instruction cannot be masked by the IF flag in the EFLAGS register.
The gate in the idt contains a descriptor privilege level (DPL) which is the largest caller privilege level (CPL) which is permitted to invoke this entry. A real NMI, which is caused by an electrical signal on the cpu, provides an artificial CPL of 0. In this way, the kernel does not have to differentiate between real signals and fake ones.
System services which are invoked via an int xx will have a numerically larger DPL to permit the instruction to open the gate with an instruction. Depending upon your kernel, it is possible that int 3 (breakpoint), 4 (overflow), and 5 (bounds) will work as direct opcodes to facilitate debugging, the "into" and "bounds" opcodes respectively.
You have found a kernel bug. Your program is trying to perform a CPU operation (int 2) that is forbidden to user-space programs, not an invalid memory access. Therefore, it should have been sent a SIGILL (Illegal instruction) signal, not a SIGSEGV signal.
The reason for the bug is probably that this particular forbidden operation is reported to the operating system with a "#GP fault" instead of a "#UD fault" (in the terms used by the x86 architecture manual). #GP faults are also used to report invalid memory accesses, and whoever wrote the code to map that to a signal didn't bother making a distinction between "actual invalid memory access" and "improper use of int reported with #GP". I observe this bug as well, on both Linux and NetBSD, so it must be an easy mistake to make.
When you're debugging a problem involving signals, it is often helpful to establish a signal handler for the troublesome signal, using sigaction with SA_SIGINFO in the flags. When you set SA_SIGINFO, the handler receives two additional arguments that provide detailed information about the signal. You don't have to use those arguments in the signal handler; instead what you do is run the program under a debugger, allow the signal to be delivered, and then inspect the details in the debugger. Here's a modification to your program that does that:
#include <signal.h>
#include <unistd.h>
#include <ucontext.h>
void handler(int s, siginfo_t *si, void *uc)
{
pause();
}
int main(void)
{
struct sigaction sa;
sa.sa_sigaction = handler;
sa.sa_flags = SA_SIGINFO | SA_RESTART;
sigemptyset(&sa.sa_mask);
sigaction(SIGBUS, &sa, 0);
sigaction(SIGFPE, &sa, 0);
sigaction(SIGILL, &sa, 0);
sigaction(SIGSEGV, &sa, 0);
sigaction(SIGSYS, &sa, 0);
sigaction(SIGTRAP, &sa, 0);
asm("int $0x2");
}
(The uc argument is a pointer to a ucontext_t, but that type is declared in <ucontext.h>, not <signal.h>, so the spec says you must define the handler to take a third argument of type void * and then cast it if you want to use it.)
I set up the handler for all of the signals corresponding to fatal, synchronous CPU exceptions, because why not. The pause is just to make execution stop indefinitely inside the handler, so I can hit control-C to break into the debugger and the signal frame will be available.
Here's what I get on Linux:
(gdb) bt
#0 0x00007ffff7eb4af4 in __libc_pause ()
at ../sysdeps/unix/sysv/linux/pause.c:29
#1 0x000055555555516d in handler (s=11, si=0x7fffffffd830, uc=0x7fffffffd700)
at test.c:5
#2 <signal handler called>
#3 main () at test.c:14
(gdb) frame 1
#1 0x000055555555516d in handler (s=11, si=0x7fffffffd830, uc=0x7fffffffd700)
at test.c:5
5 pause();
(gdb) p *si
$1 = {si_signo = 11, si_errno = 0, si_code = 128, __pad0 = 0, _sifields = {
_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {
si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0,
sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {
sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0,
si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x0,
si_addr_lsb = 0, _bounds = {_addr_bnd = {_lower = 0x0, _upper = 0x0},
_pkey = 0}}, _sigpoll = {si_band = 0, si_fd = 0}, _sigsys = {
_call_addr = 0x0, _syscall = 0, _arch = 0}}}
(gdb) p *(ucontext_t *)uc
$2 = {uc_flags = 7, uc_link = 0x0, uc_stack = {ss_sp = 0x0, ss_flags = 0,
ss_size = 0}, uc_mcontext = {gregs = {0, 0, 8, 582, 93824992235632,
140737488346656, 0, 0, 11, 140737488345936, 140737488346432, 0, 0, 0,
140737352200658, 140737488346272, 93824992235964, 66050,
12103423998558259, 18, 13, 0, 0}, fpregs = 0x7fffffffd8c0,
__reserved1 = {0, 1, 140737354129808, 140737488345320, 140737353799024,
140737354129808, 8455580781, 140737354130672}}, uc_sigmask = {__val = {
0, 11, 128, 0 <repeats 13 times>}}, __fpregs_mem = {cwd = 0, swd = 0,
ftw = 0, fop = 0, rip = 140737488346656, rdp = 0, mxcsr = 895,
mxcr_mask = 0, _st = {{significand = {0, 0, 0, 0}, exponent = 0,
__glibc_reserved1 = {0, 0, 0}}, {significand = {8064, 0, 65535, 0},
exponent = 0, __glibc_reserved1 = {0, 0, 0}}, {significand = {0, 0, 0,
0}, exponent = 0, __glibc_reserved1 = {0, 0, 0}}, {significand = {0,
0, 0, 0}, exponent = 0, __glibc_reserved1 = {0, 0, 0}}, {
significand = {0, 0, 0, 0}, exponent = 0, __glibc_reserved1 = {0, 0,
0}}, {significand = {0, 0, 0, 0}, exponent = 0, __glibc_reserved1 = {
0, 0, 0}}, {significand = {0, 0, 0, 0}, exponent = 0,
__glibc_reserved1 = {0, 0, 0}}, {significand = {0, 0, 0, 0},
exponent = 0, __glibc_reserved1 = {0, 0, 0}}}, _xmm = {{element = {0,
0, 0, 0}} <repeats 16 times>}, __glibc_reserved1 = {
0 <repeats 18 times>, 1179670611, 836, 7, 0, 832, 0}}, __ssp = {0, 0, 0,
3}}
The siginfo_t structure is basically useless; it has si_code == 128, which means "this signal was generated by the kernel but we're not going to tell you anything else about it," and all the other fields are zero. I consider this to be another kernel bug.
The ucontext_t structure is more useful; in particular
(gdb) p/x ((ucontext_t *)uc)->uc_mcontext.gregs[REG_RIP]
$3 = 0x5555555551bc
This is the address of the instruction that caused the signal. If I disassemble main...
(gdb) disas main
...
0x00005555555551b7: callq 0x555555555030 <sigaction#plt>
0x00005555555551bc: int $0x2
0x00005555555551be: mov $0x0,%eax
0x00005555555551c3: leaveq
0x00005555555551c4: retq
... I see that the instruction that caused the signal is indeed the int $0x2.
On NetBSD I get something slightly different:
(gdb) p *si
$1 = { si_pad = "[garbage]", _info = {
_signo = 11, _code = 2, _errno = 0, _pad = 0, _reason = {
_rt = {_pid = -146410395, _uid = 32639, _value = {sival_int = 4,
sival_ptr = 0x4}}, _child = {_pid = -146410395, _uid = 32639,
_status = 4, _utime = 0, _stime = 0}, _fault = {
_addr = 0x7f7ff745f465 <__sigemptyset14>, _trap = 4, _trap2 = 0,
_trap3 = 0}, _poll = {_band = 140187586131045, _fd = 4}}}}
This siginfo_t has actually been filled out. si_code 2 for a SIGSEGV is SEGV_ACCERR ("Invalid permissions for mapped object") which is not nonsense. There is not enough information in the headers or the manpages for me to understand what _trap = 4 means, or why _addr is pointing to an address somewhere inside the C library, and I don't feel like source-diving the NetBSD kernel today. ;-)
Also for reasons I don't feel like investigating today, gdb on NetBSD doesn't have access to the definition of ucontext_t (even though I explicitly included ucontext.h) so I had to dump it out raw:
(gdb) p *(ucontext_t *)uc
No symbol "ucontext_t" in current context.
(gdb) x/40xg uc
0x7f7fffffd7b0: 0x00000000000a000d 0x0000000000000000
0x7f7fffffd7c0: 0x0000000000000000 0x0000000000000000
0x7f7fffffd7d0: 0x0000000000000000 0x0000000000000000
0x7f7fffffd7e0: 0x0000000000000000 0x0000000000000005
0x7f7fffffd7f0: 0x00007f7fffffdb50 0x0000000000000000
0x7f7fffffd800: 0x00007f7ff7483a0a 0x0000000000000002
0x7f7fffffd810: 0x000000000000000d 0x00007f7ff749f340
0x7f7fffffd820: 0x0000000000000246 0x00007f7fffffdb90
0x7f7fffffd830: 0x00007f7ffffffdea 0x00007f7ff511a4c0
0x7f7fffffd840: 0x00007f7ffffffdea 0x00007f7fffffdb70
0x7f7fffffd850: 0x00007f7fffffffe0 0x0000000000000000
0x7f7fffffd860: 0x0000000000000000 0x0000000000000000
0x7f7fffffd870: 0x000000000000003f 0x00007f7ff748003f
0x7f7fffffd880: 0x0000000000000004 0x0000000000000012
0x7f7fffffd890: 0x0000000000400af5 0x000000000000e033 <---
0x7f7fffffd8a0: 0x0000000000010246 0x00007f7fffffdb50
0x7f7fffffd8b0: 0x000000000000e02b 0x00007f7ff7ffd0c0
0x7f7fffffd8c0: 0x000000000000037f 0x0000000000000000
0x7f7fffffd8d0: 0x0000000000000000 0x0000ffbf00001f80
0x7f7fffffd8e0: 0x0000000000000000 0x0000000000000000
(gdb) disas main
Dump of assembler code for function main:
...
0x0000000000400af0 <+166>: callq 0x400810 <__sigaction14#plt>
0x0000000000400af5 <+171>: int $0x2
0x0000000000400af7 <+173>: leaveq
0x0000000000400af8 <+174>: retq
The only address within the memory region pointed to by uc that bears any correspondence with the text of the program is 0x0000000000400af5, which is, again, the address of the int instruction.
I use gdb test core and get this:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000557ce64b63f8 in _create (str=str#entry=0x557ce80a8820 "SEND")
at system.c:708
708 data->res = command->data->res;
(gdb) bt
#0 0x0000557ce64b63f8 in _create (str=str#entry=0x557ce80a8820 "SEND")
at system.c:708
#1 0x0000557ce64b2ef1 in make_command (s=<optimized out>, cmd=0x557ce809cb70) at command.c:121
#2 0x0000557ce63aefdf in main (argc=<optimized out>, argv=0x7fff19053278) at main.c:394
(gdb) p *command
$1 = {status = 1, data = 0x7f21027e9a80, sum = 1543465568, time = 0, msg = { str = 0x7f20fd19f080 "GOOD", len = 4}, id = 2}
(gdb) p *command->data
$2 = {status = 1, item = 0x7f21027eb780, res = 0x7f2100990b00, sum = 1133793665}
(gdb) p *command->data->res
$3 = {msg = { str = 0x7f21010a5500 "Hi, test, test"..., len = 14}, status = 1}
(gdb) p *data
$4 = {status = 1, type = 5, res = 0x0, id = 2}
as you can see, the pointer command and command->data and data are all valid, why this SIGSEGV happened?
why this SIGSEGV happened?
We can't tell.
One possible reason: some other code is actually executing and crashing.
This could happen if system.c has been edited or updated, but the program has not been rebuilt with the new source. Or if the compiler mapping of program counter to file/line is inaccurate (this often happens with optimized code).
If you edit your question to show the output from list _create, disas $pc and info registers, we may be able to tell you more.
I'm trying to run efence on my code, and it always cores here:
Electric Fence 2.1 Copyright (C) 1987-1998 Bruce Perens.
Program received signal SIGSEGV, Segmentation fault.
memalign (alignment=4, userSize=28) at ../utils/libefence/efence.c:492
492 ../utils/libefence/efence.c: No such file or directory.
in ../utils/libefence/efence.c
(gdb) bt
#0 memalign (alignment=4, userSize=28) at ../utils/libefence/efence.c:492
#1 0xf7ff928c in malloc (size=27) at ../utils/libefence/efence.c:816
#2 0x41c92c67 in operator new(unsigned int) () from /usr/lib/libstdc++.so.6
#3 0x41c78204 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) () from /usr/lib/libstdc++.so.6
#4 0x41c7a468 in char* std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /usr/lib/libstdc++.so.6
#5 0x41c7a5d6 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) ()
from /usr/lib/libstdc++.so.6
#6 0xefb12078 in __static_initialization_and_destruction_0 ()
at ../include/isan/objstoredefs/core/Parameters.h:125
#7 _GLOBAL__sub_I_RecurrWindowPBI.cc(void) ()
at ../dme/svc/common/src/gen/ifc/beh/./imp/trig/RecurrWindowPBI.cc:77
#8 0xefbc9dfd in __do_global_ctors_aux ()
from /isan/lib/libsvc_ifc_behcommon.so
#9 0xefaf59b5 in _init () from /isan/lib/libsvc_ifc_behcommon.so
#10 0x419fd486 in __ctype_init () from /lib/libc.so.6
#11 0x4100ed39 in ?? () from /lib/ld-linux.so.2
#12 0x4100ee8f in ?? () from /lib/ld-linux.so.2
#13 0x410011ef in ?? () from /lib/ld-linux.so.2
(gdb) frame 2
#2 0x41c92c67 in operator new(unsigned int) () from /usr/lib/libstdc++.so.6
I try to run the program using GDB with environment set to LD_PRELOAD for the efence lib.
The fault seems to be here:
491 for ( slot = allocationList, count = slotCount ; count > 0; count-- ) {
492 if ( slot->mode == FREE
493 && slot->internalSize >= internalSize ) {
494 if ( !fullSlot
495 ||slot->internalSize < fullSlot->internalSize){
496 fullSlot = slot;
497 if ( slot->internalSize == internalSize
498 && emptySlots[0] )
499 break; /* All done, */
500 }
501 }
502 else if ( slot->mode == NOT_IN_USE ) {
503 if ( !emptySlots[0] )
504 emptySlots[0] = slot;
505 else if ( !emptySlots[1] )
506 emptySlots[1] = slot;
507 else if ( fullSlot
508 && fullSlot->internalSize == internalSize )
509 break; /* All done. */
510 }
511 slot++;
512 }
But, in GDB, I'm able to dup the slot structure without any issues:
(gdb) p slot
$1 = (Slot *) 0xef846000
(gdb) p slot->mode
$2 = NOT_IN_USE
(gdb)
(gdb) x/10i $eip
=> 0xf7ff9590 <memalign+448>: mov 0x10(%edi),%edx
0xf7ff9593 <memalign+451>: cmp $0x1,%edx
0xf7ff9596 <memalign+454>: jne 0xf7ff95c0 <memalign+496>
0xf7ff9598 <memalign+456>: mov 0xc(%edi),%edx
0xf7ff959b <memalign+459>: cmp %edx,%esi
0xf7ff959d <memalign+461>: ja 0xf7ff95f8 <memalign+552>
0xf7ff959f <memalign+463>: test %ecx,%ecx
0xf7ff95a1 <memalign+465>: je 0xf7ff95a8 <memalign+472>
0xf7ff95a3 <memalign+467>: cmp 0xc(%ecx),%edx
0xf7ff95a6 <memalign+470>: jae 0xf7ff95f8 <memalign+552>
(gdb) p/x $edi
$5 = 0xef846000
(gdb) ptype (*slot)
type = struct _Slot {
void *userAddress;
void *internalAddress;
size_t userSize;
size_t internalSize;
Mode mode;
}
(gdb) p allocationList
$6 = (Slot *) 0xef846000
(gdb) p allocationList[0]
$7 = {userAddress = 0x0, internalAddress = 0x0, userSize = 0,
internalSize = 0, mode = NOT_IN_USE}
Any help?
The efence code is as in:
http://linux.softpedia.com/get/Programming/Debuggers/Electric-Fence-3305.shtml
Thanks!!
So the suspect slot at 0xef846000 SEGV's but is not NULL. The explanation is that with efence the neighboring segments are read/write protected, causing a SIGBUS. But I guess on linux this BUS turns out to be a SEGV instead.
Within the GDB repl those segment read protection bits are disabled/overruled.
So the remaining question is in what memory segment allocationList[0] is located? I guess it's in a fence without read bits. How could that happen? No idea.
I am debugging an ARM microcontroller remotely and trying to modify a variable with gdb in the following block of code:
for (int i = 0; i < 100; i++) {
__asm__("nop");
}
When I execute print i I can see the value of the variable
(gdb) print i
$1 = 0
Executing whatis i returns this
whatis i
~"type = int\n"
But when I try to change the variable I get the following error
(gdb) set variable i=99
Left operand of assignment is not an lvalue.
What am I doing wrong here?
UPDATE: here is the assembler code
! for (int i = 0; i < 100; i++) {
main+38: subs\tr3, #1
main+40: bne.n\t0x80001d0 <main+36>
main+42: b.n\t0x80001c4 <main+24>
main+44: lsrs\tr0, r0, #16
main+46: ands\tr2, r0
! __asm__("nop");
main+36: nop
I had the same problem and making the variable volatile helped.
The command would be just set i = 99
Try it this way:
(gdb) print i
$1 = 3
(gdb) set var i=6
(gdb) print i
$2 = 6
There is two issue here change the variable name from i to var_i as there are some set command starting with i so set i=6 will gives the ambiguous command set error.
The "Left operand of assignment is not an lvalue." can be fixed with the code changes as shown below.
volatile int var_i = 1;
TRACE((2255, 0, NORMAL, "Ravi I am sleeping here........."));
do
{
sleep(5);
var_i = 1;
}while(var_i);
(gdb)bt
#1 0x00007f67fd7b9404 in sleep () from /lib64/libc.so.6
#2 0x00000000004cd410 in pgWSNVBUHandleGetUser (warning: Source file is more recent than executable.
ptRequest=<optimized out>, oRequest=<optimized out>,
(gdb) finish
Run till exit from #0 0x00007f67fd7b9550 in __nanosleep_nocancel () from /lib64/libc.so.6
0x00007f67fd7b9404 in sleep () from /lib64/libc.so.6
(gdb) finish
Run till exit from #0 0x00007f67fd7b9404 in sleep () from /lib64/libc.so.6
0x00000000004cd410 in pgWSNVBUHandleGetUser (ptRequest=<optimized out>, oRequest=<optimized out>,
pptResponse=0x7fff839e8760) at /root/Checkouts/trunk/source/base/webservice/provnvbuuser.c:376
(gdb)
│372 volatile int var_i = 1; │
│373 TRACE((2255, 0, NORMAL, "Ravi I am sleeping here.........")); │
│374 do │
│375 { │
>│376 sleep(5); │
│377 var_i = 1; │
│378 }while(var_i);
(gdb) set var_i=0
(gdb) n
(gdb) p var_i
$1 = 1
(gdb) set var_i=0
(gdb) p var_i
$2 = 0
(gdb) n
(gdb) n