Memory Size Load and Store penalty analysis?

Memory Size Load and Store penalty analysis? - c

Profiling the code with ocount shows more cycles with penalty on and lesser cycles with penalty off. I'm trying to understand why there is more penalty when the penalty flag is on?
uint16_t arr[1010];
uint32_t r[500];
void func()
{
uint32_t i = 0;
for (i = 0; i < 1000; i+=2)
{
arr[i] = i;
arr[i+1] = i+10;
#ifdef PENALTY_ON
r[i/2] = *(uint32_t *)((uint16_t *)&arr[i+1]);
#endif
}
#ifndef PENALTY_ON
for (i = 0; i < 1000; i+=2)
{
r[i/2] = *(uint32_t *)((uint16_t *)&arr[i+1]);
}
#endif
}

Compiling both with gcc on a 32-bit machine with -O3
With PENALTY_ON
00000000 <func>:
0: 31 c0 xor %eax,%eax
2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
8: 8d 50 0a lea 0xa(%eax),%edx
b: 66 89 94 00 02 00 00 mov %dx,0x2(%eax,%eax,1)
12: 00
13: 8b 8c 00 02 00 00 00 mov 0x2(%eax,%eax,1),%ecx
1a: 89 c2 mov %eax,%edx
1c: 66 89 84 00 00 00 00 mov %ax,0x0(%eax,%eax,1)
23: 00
24: 83 c0 02 add $0x2,%eax
27: d1 ea shr %edx
29: 3d e8 03 00 00 cmp $0x3e8,%eax
2e: 89 0c 95 00 00 00 00 mov %ecx,0x0(,%edx,4)
35: 75 d1 jne 8 <func+0x8>
37: f3 c3 repz ret
Without PENALTY_ON
00000000 <func>:
0: 31 c0 xor %eax,%eax
2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
8: 8d 50 0a lea 0xa(%eax),%edx
b: 66 89 84 00 00 00 00 mov %ax,0x0(%eax,%eax,1)
12: 00
13: 66 89 94 00 02 00 00 mov %dx,0x2(%eax,%eax,1)
1a: 00
1b: 83 c0 02 add $0x2,%eax
1e: 3d e8 03 00 00 cmp $0x3e8,%eax
23: 75 e3 jne 8 <func+0x8>
25: 66 31 c0 xor %ax,%ax
28: 8b 8c 00 02 00 00 00 mov 0x2(%eax,%eax,1),%ecx
2f: 89 c2 mov %eax,%edx
31: 83 c0 02 add $0x2,%eax
34: d1 ea shr %edx
36: 3d e8 03 00 00 cmp $0x3e8,%eax
3b: 89 0c 95 00 00 00 00 mov %ecx,0x0(,%edx,4)
42: 75 e4 jne 28 <func+0x28>
44: f3 c3 repz ret
I think the reason is that a Read-after-Write stall occurs with PENALTY_ON
b: 66 89 94 00 02 00 00 mov %dx,0x2(%eax,%eax,1)
12: 00
13: 8b 8c 00 02 00 00 00 mov 0x2(%eax,%eax,1),%ecx

Related

Binary bomb phase 3 issue

I am trying to complete a binary bomb exercise, and I have made it through the first 2 phases but I am stuck on how to even being solving it, as all the guides I have found feature a different assembly code and none of the same variables. I'm not asking for someone to solve the problem for me or anything, just how to go about actually finding the answer (i.e. what commands I would need to try or variables to focus on).
I've tried finding variables that correspond to hexadecimal letters when using x/s ___ but I've had no luck getting it correspond to lowercase letters. I've tried on my own for 3 hours to figure out this phase and I've had no luck. Here is the assembly language for phase 3:
08048c17 <phase_3>:
8048c17: 55 push %ebp
8048c18: 89 e5 mov %esp,%ebp
8048c1a: 83 ec 28 sub $0x28,%esp
8048c1d: 8d 45 f0 lea -0x10(%ebp),%eax
8048c20: 89 44 24 0c mov %eax,0xc(%esp)
8048c24: 8d 45 f4 lea -0xc(%ebp),%eax
8048c27: 89 44 24 08 mov %eax,0x8(%esp)
8048c2b: c7 44 24 04 7e 94 04 movl $0x804947e,0x4(%esp)
8048c32: 08
8048c33: 8b 45 08 mov 0x8(%ebp),%eax
8048c36: 89 04 24 mov %eax,(%esp)
8048c39: e8 a2 f9 ff ff call 80485e0 <__isoc99_sscanf#plt>
8048c3e: 83 f8 01 cmp $0x1,%eax
8048c41: 7f 05 jg 8048c48 <phase_3+0x31>
8048c43: e8 09 02 00 00 call 8048e51 <explode_bomb>
8048c48: 83 7d f4 07 cmpl $0x7,-0xc(%ebp)
8048c4c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
8048c50: 77 6a ja 8048cbc <phase_3+0xa5>
8048c52: 8b 45 f4 mov -0xc(%ebp),%eax
8048c55: ff 24 85 e0 93 04 08 jmp *0x80493e0(,%eax,4)
8048c5c: b8 00 00 00 00 mov $0x0,%eax
8048c61: eb 52 jmp 8048cb5 <phase_3+0x9e>
8048c63: b8 00 00 00 00 mov $0x0,%eax
8048c68: eb 46 jmp 8048cb0 <phase_3+0x99>
8048c6a: b8 00 00 00 00 mov $0x0,%eax
8048c6f: 90 nop
8048c70: eb 39 jmp 8048cab <phase_3+0x94>
8048c72: b8 00 00 00 00 mov $0x0,%eax
8048c77: eb 2d jmp 8048ca6 <phase_3+0x8f>
8048c79: b8 00 00 00 00 mov $0x0,%eax
8048c7e: 66 90 xchg %ax,%ax
8048c80: eb 1f jmp 8048ca1 <phase_3+0x8a>
8048c82: b8 00 00 00 00 mov $0x0,%eax
8048c87: eb 13 jmp 8048c9c <phase_3+0x85>
8048c89: b8 6d 03 00 00 mov $0x36d,%eax
8048c8e: 66 90 xchg %ax,%ax
8048c90: eb 05 jmp 8048c97 <phase_3+0x80>
8048c92: b8 00 00 00 00 mov $0x0,%eax
8048c97: 2d c5 02 00 00 sub $0x2c5,%eax
8048c9c: 05 94 03 00 00 add $0x394,%eax
8048ca1: 2d e2 00 00 00 sub $0xe2,%eax
8048ca6: 05 e2 00 00 00 add $0xe2,%eax
8048cab: 2d e2 00 00 00 sub $0xe2,%eax
8048cb0: 05 e2 00 00 00 add $0xe2,%eax
8048cb5: 2d e2 00 00 00 sub $0xe2,%eax
8048cba: eb 0a jmp 8048cc6 <phase_3+0xaf>
8048cbc: e8 90 01 00 00 call 8048e51 <explode_bomb>
8048cc1: b8 00 00 00 00 mov $0x0,%eax
8048cc6: 83 7d f4 05 cmpl $0x5,-0xc(%ebp)
8048cca: 7f 06 jg 8048cd2 <phase_3+0xbb>
8048ccc: 3b 45 f0 cmp -0x10(%ebp),%eax
8048ccf: 90 nop
8048cd0: 74 05 je 8048cd7 <phase_3+0xc0>
8048cd2: e8 7a 01 00 00 call 8048e51 <explode_bomb>
8048cd7: c9 leave
8048cd8: c3 ret
Here is the guide I have been provided for phase 3, it seems very valuable but I am having trouble understanding how to apply it to my code given the different variables/values. Here are screen caps of what I think are the most relevant parts, and the full link to the guide below those screen caps: Introductory guide
Guide part 1
Guide part 2
Here is a full link to the guide
The goal is to have a number followed by a lowercase letter and then followed by a number (like 1 v 240 or 4 b 60). But I haven't been able to find any numbers or a letter.

Turns out I was mistaken on the method, and the problem was actually looking for a pair of numbers without a letter. For those wondering, I found the solution by following the scanf assembly sections like was suggested by De Dycker, and ended up charting the first number followed by finding the second pair through trial and error.

Manual decompilation of asm snippet

I've been trying to decompile the following asm snippet(that's all I have):
55 push %rbp
48 89 e5 mov %rsp,%rbp
48 81 ec d0 00 00 00 sub $0xd0,%rsp
64 48 8b 04 25 28 00 mov %fs:0x28,%rax
00 00
48 89 45 f8 mov %rax,-0x8(%rbp)
31 c0 xor %eax,%eax
48 c7 85 30 ff ff ff movq $0x0,-0xd0(%rbp)
00 00 00 00
48 8d b5 38 ff ff ff lea -0xc8(%rbp),%rsi
b8 00 00 00 00 mov $0x0,%eax
ba 18 00 00 00 mov $0x18,%edx
48 89 f7 mov %rsi,%rdi
48 89 d1 mov %rdx,%rcx
f3 48 ab rep stos %rax,%es:(%rdi)
48 8b 15 19 06 20 00 mov 0x200619(%rip),%rdx
48 8d 85 30 ff ff ff lea -0xd0(%rbp),%rax
be ce 0f 40 00 mov $0x400fce,%esi
48 89 c7 mov %rax,%rdi
b8 00 00 00 00 mov $0x0,%eax
e8 4e fc ff ff callq 4008a0 <sprintf#plt>
Here is my attempt:
char buf[192] = {0};
sprintf(buf, "hello %s", name);
I've compiled this with gcc 4.8.5, and it gave me:
55 push %rbp
48 89 e5 mov %rsp,%rbp
48 81 ec d0 00 00 00 sub $0xd0,%rsp
64 48 8b 04 25 28 00 mov %fs:0x28,%rax
00 00
48 89 45 f8 mov %rax,-0x8(%rbp)
31 c0 xor %eax,%eax
48 8d b5 30 ff ff ff lea -0xd0(%rbp),%rsi
b8 00 00 00 00 mov $0x0,%eax
ba 18 00 00 00 mov $0x18,%edx
48 89 f7 mov %rsi,%rdi
48 89 d1 mov %rdx,%rcx
f3 48 ab rep stos %rax,%es:(%rdi)
48 8b 15 14 14 20 00 mov 0x201414(%rip),%rdx
48 8d 85 30 ff ff ff lea -0xd0(%rbp),%rax
be 2e 10 40 00 mov $0x40102e,%esi
48 89 c7 mov %rax,%rdi
b8 00 00 00 00 mov $0x0,%eax
e8 cb fb ff ff callq 4008a0 <sprintf#plt>
I'm struggling to figure out why this exists:
movq $0x0,-0xd0(%rbp)
and also the subsequent usage of -0xd0(%rbp) as a pointer for the argument to sprintf. I'm puzzled because the rep stos begin at -0xc8(%rbp) and not -0xd0(%rbp).
This is probably compiler specific, but still I'm curious what could possibly be the original code that produced that asm.

I imagine something like:
char buf[192] = {0, 0, 0, 0, 0, 0, 0, 0};
sprintf(buf + 8, "hello %s", name);
... would give you that output.
The movq instruction you refer to stores 0 (an 8-byte quantity) at the beginning of an array. The -0xc8(%rbp) comes from copying a string to an offset within the array.

Assembly - js versus ja instruction

So the goal is for me to write out the C code that corresponds to this assembly :
0: 85 f6 test %esi,%esi
2: 78 13 js 17 <part3+0x17>
4: 83 fe 07 cmp $0x7,%esi
7: 77 14 ja 1d <part3+0x1d>
9: 8d 0c f5 00 00 00 00 lea 0x0(,%rsi,8),%ecx
10: 48 d3 ff sar %cl,%rdi
13: 48 89 f8 mov %rdi,%rax
16: c3 retq
17: b8 00 00 00 00 mov $0x0,%eax
1c: c3 retq
1d: b8 00 00 00 00 mov $0x0,%eax
22: c3 retq
I am a little confused because the first loop testing the %esi register ends before the second loop ends.
Is the second if statement comparing %esi to 7 inside the first loop? or is this a if , else if situation??

Let me sum up, what's already been said
0: 85 f6 test %esi,%esi
2: 78 13 js 17 <part3+0x17>
this is " if (esi < 0) goto 17; "
4: 83 fe 07 cmp $0x7,%esi
7: 77 14 ja 1d <part3+0x1d>
this is " if (esi >7) goto 1d; "
9: 8d 0c f5 00 00 00 00 lea 0x0(,%rsi,8),%ecx
"cx = 8*rsi" // not that obvious it's "just" a multiplication)
10: 48 d3 ff sar %cl,%rdi
rdi >> cl; // not cx, but cx is safe to be <= 7*8, so that's the same
13: 48 89 f8 mov %rdi,%rax
16: c3 retq
return rdi;
17: b8 00 00 00 00 mov $0x0,%eax
1c: c3 retq
17: "return 0"
1d: b8 00 00 00 00 mov $0x0,%eax
22: c3 retq
1d: another "return 0"
so the C-Code is:
{
if (esi < 0) return 0;
if (esi > 7) return 0;
return rdi >> ( 8 * rsi );
}
PS: the 2 "return 0" (17 and 1d) give a clear indication that, in the C-code, the two ifs were NOT combined into one
PSS: the C Code was obviously not compiled with optimization :P

C: float changing to 'nan' value

I have an inner function in a larger program that is somehow changing a float value to "nan" when I expect it to be zero. I have trimmed the function down to the simplest form, with no parameters:
static void func(void)
{
int a = 1;
float x = 0.0f;
float v = 0.0f;
printf("x(%f), ", x);
x += (float)a * v;
printf("x(%f), ", x);
printf("(int)x: %d, ", (int)x);
}
This gives the output:
x(0.000000), x(nan), (int)x: -2147483648
If I remove the variable a and hardcode the value (1), the nan value goes away. Similarly, if I remove the line x += (float)a * v;, everything prints as expected (all zeroes).
The frustrating part is that I can't reproduce this by just creating a new program and tossing this in main(). When I try that, the program works perfectly and outputs:
x(0.000000), x(0.000000), (int)x: 0
Disassembly from the function in the actual program:
00000029 <_func>:
29: 55 push %ebp
2a: 89 e5 mov %esp,%ebp
2c: 83 ec 38 sub $0x38,%esp
2f: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%ebp)
36: a1 18 00 00 00 mov 0x18,%eax
3b: 89 45 f0 mov %eax,-0x10(%ebp)
3e: a1 18 00 00 00 mov 0x18,%eax
43: 89 45 ec mov %eax,-0x14(%ebp)
46: d9 45 f0 flds -0x10(%ebp)
49: dd 5c 24 04 fstpl 0x4(%esp)
4d: c7 04 24 00 00 00 00 movl $0x0,(%esp)
54: e8 a7 ff ff ff call 0 <_printf>
59: d9 45 f0 flds -0x10(%ebp)
5c: db 45 f4 fildl -0xc(%ebp)
5f: d9 5d e4 fstps -0x1c(%ebp)
62: d9 45 e4 flds -0x1c(%ebp)
65: d9 45 ec flds -0x14(%ebp)
68: de c9 fmulp %st,%st(1)
6a: de c1 faddp %st,%st(1)
6c: d9 5d f0 fstps -0x10(%ebp)
6f: d9 45 f0 flds -0x10(%ebp)
72: dd 5c 24 04 fstpl 0x4(%esp)
76: c7 04 24 00 00 00 00 movl $0x0,(%esp)
7d: e8 7e ff ff ff call 0 <_printf>
82: d9 45 f0 flds -0x10(%ebp)
85: d9 7d e2 fnstcw -0x1e(%ebp)
88: 0f b7 45 e2 movzwl -0x1e(%ebp),%eax
8c: b4 0c mov $0xc,%ah
8e: 66 89 45 e0 mov %ax,-0x20(%ebp)
92: d9 6d e0 fldcw -0x20(%ebp)
95: db 5d dc fistpl -0x24(%ebp)
98: d9 6d e2 fldcw -0x1e(%ebp)
9b: 8b 45 dc mov -0x24(%ebp),%eax
9e: 89 44 24 04 mov %eax,0x4(%esp)
a2: c7 04 24 08 00 00 00 movl $0x8,(%esp)
a9: e8 52 ff ff ff call 0 <_printf>
ae: c9 leave
af: c3 ret
Disassembly from stand-alone function (as main()):
00000000 <_main>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 e4 f0 and $0xfffffff0,%esp
6: 83 ec 30 sub $0x30,%esp
9: e8 00 00 00 00 call e <_main+0xe>
e: c7 44 24 2c 01 00 00 movl $0x1,0x2c(%esp)
15: 00
16: a1 18 00 00 00 mov 0x18,%eax
1b: 89 44 24 28 mov %eax,0x28(%esp)
1f: a1 18 00 00 00 mov 0x18,%eax
24: 89 44 24 24 mov %eax,0x24(%esp)
28: d9 44 24 28 flds 0x28(%esp)
2c: dd 5c 24 04 fstpl 0x4(%esp)
30: c7 04 24 00 00 00 00 movl $0x0,(%esp)
37: e8 00 00 00 00 call 3c <_main+0x3c>
3c: db 44 24 2c fildl 0x2c(%esp)
40: d8 4c 24 24 fmuls 0x24(%esp)
44: d9 44 24 28 flds 0x28(%esp)
48: de c1 faddp %st,%st(1)
4a: d9 5c 24 28 fstps 0x28(%esp)
4e: d9 44 24 28 flds 0x28(%esp)
52: dd 5c 24 04 fstpl 0x4(%esp)
56: c7 04 24 00 00 00 00 movl $0x0,(%esp)
5d: e8 00 00 00 00 call 62 <_main+0x62>
62: d9 44 24 28 flds 0x28(%esp)
66: d9 7c 24 1e fnstcw 0x1e(%esp)
6a: 0f b7 44 24 1e movzwl 0x1e(%esp),%eax
6f: b4 0c mov $0xc,%ah
71: 66 89 44 24 1c mov %ax,0x1c(%esp)
76: d9 6c 24 1c fldcw 0x1c(%esp)
7a: db 5c 24 18 fistpl 0x18(%esp)
7e: d9 6c 24 1e fldcw 0x1e(%esp)
82: 8b 44 24 18 mov 0x18(%esp),%eax
86: 89 44 24 04 mov %eax,0x4(%esp)
8a: c7 04 24 08 00 00 00 movl $0x8,(%esp)
91: e8 00 00 00 00 call 96 <_main+0x96>
96: b8 00 00 00 00 mov $0x0,%eax
9b: c9 leave
9c: c3 ret
9d: 90 nop
9e: 90 nop
9f: 90 nop

This issue is often the result of undefined behavior. In this specific instance, there was an implicit function declaration (a header file hadn't been included elsewhere in the program) which caused UB, and resulted in this bug.

How do I disassemble an object program made on C?

How do I disassemble an object program made on C(linux)?
Can anyone please help me with the command line.

Use objdump:
objdump -d -C file.o
Sample output:
...
0000015e <add_exclude>:
15e: 55 push %ebp
15f: 89 e5 mov %esp,%ebp
161: 83 ec 08 sub $0x8,%esp
164: a1 10 00 00 00 mov 0x10,%eax
169: 3b 05 14 00 00 00 cmp 0x14,%eax
16f: 7f 54 jg 1c5 <add_exclude+0x67>
171: 83 3d 10 00 00 00 00 cmpl $0x0,0x10
178: 75 1f jne 199 <add_exclude+0x3b>
17a: 83 ec 0c sub $0xc,%esp
17d: c7 05 10 00 00 00 40 movl $0x40,0x10
184: 00 00 00
187: 68 00 01 00 00 push $0x100
18c: e8 fc ff ff ff call 18d <add_exclude+0x2f>
191: 83 c4 10 add $0x10,%esp
...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Memory Size Load and Store penalty analysis? - c

Related

Binary bomb phase 3 issue

Manual decompilation of asm snippet

Assembly - js versus ja instruction

C: float changing to 'nan' value

How do I disassemble an object program made on C?

Categories

Resources