So, I have the following c program:
#include <stdio.h>
#include <string.h>
int main(){
char arr[20];
//this is line 6
strcpy(arr,"Hello, world!\n");
printf(arr);
}
I compiled it using the following command:
gcc -g t2.c -o a2.out
After that I loaded it in gdb and tried setting breakpoints at line 6, at the strcpy function and at line 8. Sure enough, when setting the breakpoint at strcpy I got the following message : "Make breakpoint pending on future shared library load? (y or [n])". I answered "y" and got "Breakpoint 2 (strcpy) pending.".
After answering yes, and running through the program, Breakpoint 2 is never resolved, and the debugger jumps straight to Breakpoint 3 at printf.
I am using Intel syntax in my debugger. Other than that no custom settings. Can anyone tell why the Breakpoint at strcpy is never resolved?
Compilers such as gcc are deeply familiar with the semantics of string functions such as strcpy.
On x86-64 with your example, gcc 9 is generating inline assembly rather than a strcpy call even at
-O0. The breakpoint should work for most other functions.
x86-64 disassembly generated with gcc-9 (no strcpy call):
0000000000000000 <main>:
0: 48 83 ec 28 sub rsp,0x28
4: 48 b8 48 65 6c 6c 6f 2c 20 77 movabs rax,0x77202c6f6c6c6548
e: bf 01 00 00 00 mov edi,0x1
13: 48 89 04 24 mov QWORD PTR [rsp],rax
17: b8 21 0a 00 00 mov eax,0xa21
1c: 48 89 e6 mov rsi,rsp
1f: 66 89 44 24 0c mov WORD PTR [rsp+0xc],ax
24: 31 c0 xor eax,eax
26: c7 44 24 08 6f 72 6c 64 mov DWORD PTR [rsp+0x8],0x646c726f
2e: c6 44 24 0e 00 mov BYTE PTR [rsp+0xe],0x0
33: e8 00 00 00 00 call 38 <main+0x38> 34: R_X86_64_PLT32 __printf_chk-0x4
38: 31 c0 xor eax,eax
3a: 48 83 c4 28 add rsp,0x28
3e: c3 ret
Related
I want to understand AFL's code instrumentation in detail.
Compiling a sample program sample.c
int main(int argc, char **argv) {
int ret = 0;
if(argc > 1) {
ret = 7;
} else {
ret = 12;
}
return ret;
}
with gcc -c -o obj/sample-gcc.o src/sample.c and afl-gcc -c -o obj/sample-afl-gcc.o src/sample.c and disassembling both with objdump -d leads to different Assembly code:
[GCC]
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 89 7d ec mov %edi,-0x14(%rbp)
b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
f: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
16: 83 7d ec 01 cmpl $0x1,-0x14(%rbp)
1a: 7e 09 jle 25 <main+0x25>
1c: c7 45 fc 07 00 00 00 movl $0x7,-0x4(%rbp)
23: eb 07 jmp 2c <main+0x2c>
25: c7 45 fc 0c 00 00 00 movl $0xc,-0x4(%rbp)
2c: 8b 45 fc mov -0x4(%rbp),%eax
2f: 5d pop %rbp
30: c3 retq
[AFL-GCC]
0000000000000000 <main>:
0: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp
7: ff
8: 48 89 14 24 mov %rdx,(%rsp)
c: 48 89 4c 24 08 mov %rcx,0x8(%rsp)
11: 48 89 44 24 10 mov %rax,0x10(%rsp)
16: 48 c7 c1 0e ff 00 00 mov $0xff0e,%rcx
1d: e8 00 00 00 00 callq 22 <main+0x22>
22: 48 8b 44 24 10 mov 0x10(%rsp),%rax
27: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx
2c: 48 8b 14 24 mov (%rsp),%rdx
30: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp
37: 00
38: f3 0f 1e fa endbr64
3c: 31 c0 xor %eax,%eax
3e: 83 ff 01 cmp $0x1,%edi
41: 0f 9e c0 setle %al
44: 8d 44 80 07 lea 0x7(%rax,%rax,4),%eax
48: c3 retq
AFL (usually) adds a trampoline in front of every basic block to track executed paths [https://github.com/mirrorer/afl/blob/master/afl-as.h#L130]
-> Instruction 0x00 lea until 0x30 lea
AFL (usually) adds a main payload to the program (which I excluded due to simplicity) [https://github.com/mirrorer/afl/blob/master/afl-as.h#L381]
AFL claims to use a wrapper for GCC, so I expected everything else to be equal. Why is the if-else-condition still compiled differently?
Bonus question: If a binary without source code available should be instrumented manually without using AFL's QEMU-mode or Unicorn-mode, can this be achieved by (naively) adding the main payload and each trampoline manually to the binary file or are there better approaches?
Re: Why the compilation with gcc and with afl-gcc is different, a short look at the afl-gcc source shows that by default it modifies the compiler parameters, setting -O3 -funroll-loops (as well as defining __AFL_COMPILER and FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION).
According to the documentation (docs/env_variables.txt):
By default, the wrapper appends -O3 to optimize builds. Very rarely,
this will cause problems in programs built with -Werror, simply
because -O3 enables more thorough code analysis and can spew out
additional warnings. To disable optimizations, set AFL_DONT_OPTIMIZE.
Background
I'm making an app that needs to run several tasks concurrently. I can't use threads and such because the app should work without any OS (i.e. straight from the bootsector). Using x86 tasks looks like an overkill (both logically and performance-wise). Thus, I decided to implement a task-switching utility myself. I would save processor state, make a call to the task code and then restore the previous state. So I have to make the call from inline assembly.
Problem
Here's some example code:
#include <stdio.h>
void func() {
printf("Hello, world!\n");
}
void (*funcptr)();
int main() {
funcptr = func;
asm(
"call *%0;"
:
:"r"(funcptr)
);
return 0;
}
It compiles perfectly under icc with no options, gcc and clang and yields "Hello, world!" when run. However, if I compile it with icc main.c -ipo, it segfaults.
I disassembled the code that was generated by icc main.c and got the following:
0000000000401220 <main>:
401220: 55 push %rbp
401221: 48 89 e5 mov %rsp,%rbp
401224: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401228: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40122f: bf 03 00 00 00 mov $0x3,%edi
401234: 33 f6 xor %esi,%esi
401236: e8 45 00 00 00 callq 401280 <__intel_new_feature_proc_init>
40123b: 0f ae 1c 24 stmxcsr (%rsp)
40123f: 48 c7 05 f6 78 00 00 movq $0x401270,0x78f6(%rip) # 408b40 <funcptr>
401246: 70 12 40 00
40124a: b8 70 12 40 00 mov $0x401270,%eax
40124f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401256: 0f ae 14 24 ldmxcsr (%rsp)
40125a: ff d0 callq *%rax
40125c: 33 c0 xor %eax,%eax
40125e: 48 89 ec mov %rbp,%rsp
401261: 5d pop %rbp
401262: c3 retq
401263: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
401268: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40126f: 00
0000000000401270 <func>:
401270: bf 04 40 40 00 mov $0x404004,%edi
401275: e9 e6 fd ff ff jmpq 401060 <puts#plt>
40127a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
On the other hand, icc main.c -ipo yields:
0000000000401210 <main>:
401210: 55 push %rbp
401211: 48 89 e5 mov %rsp,%rbp
401214: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401218: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40121f: bf 03 00 00 00 mov $0x3,%edi
401224: 33 f6 xor %esi,%esi
401226: e8 25 00 00 00 callq 401250 <__intel_new_feature_proc_init>
40122b: 0f ae 1c 24 stmxcsr (%rsp)
40122f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401236: 48 8b 05 cb 2d 00 00 mov 0x2dcb(%rip),%rax # 404008 <funcptr_2.dp.0>
40123d: 0f ae 14 24 ldmxcsr (%rsp)
401241: ff d0 callq *%rax
401243: 33 c0 xor %eax,%eax
401245: 48 89 ec mov %rbp,%rsp
401248: 5d pop %rbp
401249: c3 retq
40124a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
So, while -ipo didn't remove funcptr variable (see address 401236), it did remove assignment. I guess that icc noticed that func is not called from C code so it can be safely removed, so funcptr is allowed to contain garbage. However, it didn't notice that I'm calling func indirectly via assembly.
What I tried
Replacing "r"(funcptr) with "r"(func) works but I can't hardcode a specific function (see background).
Calling funcptr and/or func before and/or after inline assembly block don't help because icc just inlines printf("Hello, world!\n");.
I can't get rid of inline assembly because I have to do low-level register, flags and stack manipulation before and after call.
Making funcptr volatile yields the following warning but still segfaults:
a value of type "void (*)()" cannot be assigned to an entity of type "volatile void (*)()"
Adding volatile to almost every other word doesn't help either.
Moving func and/or funcptr to other source files and then linking them together doesn't help.
Moving inline assembly to a separate function doesn't work.
Am I doing something wrong or is it an icc bug? If the former, how do I fix the code? If the latter, is there any workaround and should I report the bug?
$ icc --version
icc (ICC) 19.1.0.166 20191121
Copyright (C) 1985-2019 Intel Corporation. All rights reserved.
This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 3 years ago.
Excuse my bad English.
I have written down some lines to return max, min, sum of all values, and arrange all values in ascending order when five integers are input.
While writing, I mistakenly wrote 'num[4]' when I declared a INT array when I needed to put in 5 integers.
But as I compiled with TDM-GCC 4.9.2 64-bit release, it worked without any problem. As soon as I realized and changed to TDM-GCC 4.9.2 32-bit release, it did not.
This is my whole code;
#include<stdio.h>
int main()
{
int num[4],i,j,k,a,b,c,m,number,sum=0;
printf("This program returns max, min, sum of all values, and arranges all values in ascending order when five integers are input.\n");
printf("Please enter five integers.\n");
for(i=0;i<5;i++)
{
printf("Enter #%d\n",i+1);
scanf("%d",&num[i]);
}
//arrange all values
for(j=0;j<5;j++)
{
for(k=j+1;k<5;k++)
{
if(num[j]>num[k])
{
number=num[j];
num[j]=num[k];
num[k]=number;
}
}
}
//find maximum value
int max=num[0];
for(a=1;a<5;a++)
{
if(max<num[a])
{
max=num[a];
}
}
//find minimum value
int min=num[0];
for(b=1;b<5;b++)
{
if(min>num[b])
{
min=num[b];
}
}
//find sum of all values
for(c=0;c<5;c++)
{
sum=sum+num[c];
}
printf("Max Value : %d\n",max);//print max
printf("Min Value : %d\n",min);//print min
printf("Sum : %d\n",sum); //print sum
printf("In ascending order : "); //print all values in ascending order
for(m=0;m<5;m++)
{
printf("%d ",num[m]);
}
}
I am new to C and all kinds of programming, and don't know how to search these kind of problems. I know my way of asking like this here is very inappropriate, and I sincerely apologize to people who are irritated by these types of questioning posts. But this is my best try, so please don't blame, but I'm willing to accept any kind of advice or tips.
Thank you.
When allocating on the stack, GCC targeting 64-bit (and probably Clang) will align stack allocations to 8 bytes.
For 32-bit targets, it's only going to use 4 bytes of padding.
So when you compiled your program for 64-bit, an extra four bytes was used to pad the stack. That's why when you accessed that last integer, it didn't segfault.
To see this in action, we'll create a test file.
void test_func() {
int n[4];
int b = 11;
for (int i = 0; i < 4; i++) {
n[i] = b;
}
}
And we'll compile it for 32-bit and 64-bit.
gcc -g -c -m64 test.c -o test_64.o
gcc -g -c -m32 test.c -o test_32.o
And now we'll print the disassembly for each.
objdump -S test_64.o >test_64_dis.txt
objdump -S test_32.o >test_32_dis.txt
Here's the contents of the 64-bit version.
test_64.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <func>:
void func() {
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 30 sub $0x30,%rsp
c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
13: 00 00
15: 48 89 45 f8 mov %rax,-0x8(%rbp)
19: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1b: c7 45 dc 0b 00 00 00 movl $0xb,-0x24(%rbp)
for (int i = 0; i < 4; i++) {
22: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
29: eb 10 jmp 3b <func+0x3b>
n[i] = b;
2b: 8b 45 d8 mov -0x28(%rbp),%eax
2e: 48 98 cltq
30: 8b 55 dc mov -0x24(%rbp),%edx
33: 89 54 85 e0 mov %edx,-0x20(%rbp,%rax,4)
for (int i = 0; i < 4; i++) {
37: 83 45 d8 01 addl $0x1,-0x28(%rbp)
3b: 83 7d d8 03 cmpl $0x3,-0x28(%rbp)
3f: 7e ea jle 2b <func+0x2b>
}
}
41: 90 nop
42: 48 8b 45 f8 mov -0x8(%rbp),%rax
46: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
4d: 00 00
4f: 74 05 je 56 <func+0x56>
51: e8 00 00 00 00 callq 56 <func+0x56>
56: c9 leaveq
57: c3 retq
Here's the 32-bit version.
test_32.o: file format elf32-i386
Disassembly of section .text:
00000000 <func>:
void func() {
0: f3 0f 1e fb endbr32
4: 55 push %ebp
5: 89 e5 mov %esp,%ebp
7: 83 ec 28 sub $0x28,%esp
a: e8 fc ff ff ff call b <func+0xb>
f: 05 01 00 00 00 add $0x1,%eax
14: 65 a1 14 00 00 00 mov %gs:0x14,%eax
1a: 89 45 f4 mov %eax,-0xc(%ebp)
1d: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1f: c7 45 e0 0b 00 00 00 movl $0xb,-0x20(%ebp)
for (int i = 0; i < 4; i++) {
26: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%ebp)
2d: eb 0e jmp 3d <func+0x3d>
n[i] = b;
2f: 8b 45 dc mov -0x24(%ebp),%eax
32: 8b 55 e0 mov -0x20(%ebp),%edx
35: 89 54 85 e4 mov %edx,-0x1c(%ebp,%eax,4)
for (int i = 0; i < 4; i++) {
39: 83 45 dc 01 addl $0x1,-0x24(%ebp)
3d: 83 7d dc 03 cmpl $0x3,-0x24(%ebp)
41: 7e ec jle 2f <func+0x2f>
}
}
43: 90 nop
44: 8b 45 f4 mov -0xc(%ebp),%eax
47: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
4e: 74 05 je 55 <func+0x55>
50: e8 fc ff ff ff call 51 <func+0x51>
55: c9 leave
56: c3 ret
Disassembly of section .text.__x86.get_pc_thunk.ax:
00000000 <__x86.get_pc_thunk.ax>:
0: 8b 04 24 mov (%esp),%eax
3: c3 ret
You can see the compiler is generating 24 bytes and then 20 bytes respectively, if you look right after the variable declarations.
Regarding advice/tips you asked for, a good starting point would be to enable all compiler warnings and treat them as errors. In GCC and Clang, you'd use the -Wall -Wextra -Werror -Wfatal-errors.
I wouldn't recommend this if you're using the MSVC compiler, though, which often issues warnings about declarations from the header files it's distributed with.
Other answers cover what might he actually happening, by analyzing the generated assembly, but the really relevant explanation is: Indexing out of array bounds is Undefined Behavior in C. And that's kinda the end of story.
UB means, the code is "allowed" to do anything by C standard. It could do different thing every time it is run. It could do what you want it to do with no ill effects. It might do what you want, but then something completely unrelated behaves in a funny way. Compiler, operating system, or even phase of the moon could make a difference. Or not.
It is generally not useful to think about what actually happens with Undefined Behavior at C level. You can of course produce the assembly output of a particular compilation, and inspect what it does, but that is result of that one compilation. A new compilation might change things (even if you just do new build at different time, because value of __TIME__ macro depends on time...).
Here gdb does not stop at Line:4.
Next,
Without hitting the declaration line at Line:5, variable x is existing and initialized.
Next,
But here it shows out of scope (yes it should according to me).
Now, I have the following doubts regarding this particular instance of c program.
When exactly the memory for variable x in P1() gets created and initialized?
why gdb did not stop at static declaration statement in inside P1() in the first example?
If we call P1() again will the program control simply skip the declaration statement?
It has already been explained (in related topics linked in comments below question) how static variables work.
Here is actual code generated by a gcc for your p1 function (by gcc -c -O0 -fomit-frame-pointer -g3 staticvar.c -o staticvar.o) then disassembled with related source.
Disassembly of section .text:
0000000000000000 <p1>:
#include <stdio.h>
void p1(void)
{
0: 48 83 ec 08 sub $0x8,%rsp
static int x = 10;
x += 5;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <p1+0xa>
a: 83 c0 05 add $0x5,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <p1+0x13>
printf("%d\n", x);
13: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 19 <p1+0x19>
19: 89 c6 mov %eax,%esi
1b: bf 00 00 00 00 mov $0x0,%edi
20: b8 00 00 00 00 mov $0x0,%eax
25: e8 00 00 00 00 callq 2a <p1+0x2a>
}
2a: 90 nop
2b: 48 83 c4 08 add $0x8,%rsp
2f: c3 retq
So, as you see there is no code for declaration of x. GDB can only break on actual machine code instruction and as there is none, it breaks on next instruction (mov), which matches line 5.
I have the following code:
struct cre_eqEntry *
cre_eventGet(struct cre_eqObj *eq_obj)
{
struct cre_eqEntry *eqe = cre_queueTailNode(&eq_obj->q);
Memcpy(&tmpEqo, eq_obj, sizeof(struct cre_eqObj));
volatile u32 ddd = 0;
ddd = ((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt;
CPUMemFenceReadWrite();
if (!ddd) {
tmp = eq_obj->q.tail;
assert(0);
return NULL;
}
}
It is a piece of kernel code. When I ran it, it fails at assert(0). So apparently ddd should be 0. But when I used GDB to debug the core dump and printed out '((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt', surprisingly, the value is not 0.
So I start suspecting it is the problem of compiler over-optimization. Here's the disassembly code:
00000000000047ec <cre_eventGet>:
47ec: 55 push %rbp
47ed: 48 89 fe mov %rdi,%rsi
47f0: ba 80 00 00 00 mov $0x80,%edx
47f5: 53 push %rbx
47f6: 48 89 fb mov %rdi,%rbx
47f9: 48 83 ec 18 sub $0x18,%rsp
47fd: 0f b7 6f 24 movzwl 0x24(%rdi),%ebp
4801: 0f b7 47 28 movzwl 0x28(%rdi),%eax
4805: 0f af e8 imul %eax,%ebp
4808: 48 63 ed movslq %ebp,%rbp
480b: 48 03 6f 18 add 0x18(%rdi),%rbp
480f: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 4816 <cre_eventGet+0x2a>
4816: e8 00 00 00 00 callq 481b <cre_eventGet+0x2f>
481b: 0f b7 43 28 movzwl 0x28(%rbx),%eax
481f: 48 8b 53 18 mov 0x18(%rbx),%rdx
4823: c7 44 24 0c 00 00 00 movl $0x0,0xc(%rsp)
482a: 00
482b: c1 e0 02 shl $0x2,%eax
482e: 48 98 cltq
4830: 8b 04 02 mov (%rdx,%rax,1),%eax
4833: 89 44 24 0c mov %eax,0xc(%rsp)
4837: 0f ae f0 mfence
483a: 8b 44 24 0c mov 0xc(%rsp),%eax
483e: 85 c0 test %eax,%eax
4840: 74 14 je 4856 <cre_eventGet+0x6a>
As far as I can see, the assembly code does the same thing as the C code.
So now I ran out of ideas what is causing the problem of inconsistency of 'ddd'.
Please kindly give me some hints!
ddd = ((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt;
Simplify your code. Perform address/boundary checks/validation. Your problem is likely that you are de-referencing some random, uninitialized, address within your process/thread's address space.
ddd = ((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt; probably violates the strict aliasing rule (can't say 100% for sure without seeing the whole code).
If using gcc/clang, compile with -fno-strict-aliasing unless you want to rewrite your code to comply with the standard.
To do the latter, memcpy((u32 *)&ddd, &(struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail)->evt, sizeof ddd); but I guess your codebase may have similar violations in many places, so as a first step, using the compiler flag would be a way to see if this really is the problem.
The magic number 4 is suspicious too, review your code to check if this really is the correct offset and also check that it is not out of bounds of allocated memory.