How to recognize functions start address and end address in binary of PowerPC? - disassembly

Given a PowerPC binary file(ELF), we can disassembly it, but how to recoginze the functions as the IDA pro does? Is there an algorithm?

Can't you use objdump? If the elf file contains the debug symbols, you can see the functions and the code, as for example:
#include <altivec.h>
include
void print_vec_char(char *s, vector signed char _data){
int i;
printf("%s\t:", s);
for (i = 0 ; i <= 15; i++)
printf("%3d ", vec_extract(_data, i));
printf("\n");
}
void print_vec_long(char *s, vector signed long int _data){
int i;
printf("%s\t:", s);
for (i = 0 ; i <= 1; i++)
printf("%ld ", vec_extract(_data, i));
printf("\n");
}
int main(){
vector signed long int output;
signed char x, y;
vector signed char _data;
const vector signed char bits = {120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 0};
// Initialize the vector _data with the same values
_data = vec_splat_s8(-1);
print_vec_char("_data", _data);
print_vec_char("bits", bits);
output = vec_vbpermq(_data, bits);
print_vec_long("output", output);
y = vec_extract(output, 0);
x = vec_extract(output, 1);
printf("First Half = %x\n", y);
printf("Second Half = %x\n", x);
}
and the functions in the elf could be showed as:
# objdump -D test | grep print_vec_long -A 10
0000000000000924 <print_vec_long>:
924: 02 00 4c 3c addis r2,r12,2
928: dc 75 42 38 addi r2,r2,30172
92c: a6 02 08 7c mflr r0
930: 10 00 01 f8 std r0,16(r1)
934: f8 ff e1 fb std r31,-8(r1)
938: 61 ff 21 f8 stdu r1,-160(r1)
93c: 78 0b 3f 7c mr r31,r1
940: 78 00 7f f8 std r3,120(r31)
944: 56 12 02 f0 xxswapd vs0,vs34
948: d0 ff 20 39 li r9,-48
or
objdump -D test | grep print_vec_long -A 10
0000000000000924 <print_vec_long>:
924: 02 00 4c 3c addis r2,r12,2
928: dc 75 42 38 addi r2,r2,30172
92c: a6 02 08 7c mflr r0
930: 10 00 01 f8 std r0,16(r1)
934: f8 ff e1 fb std r31,-8(r1)
938: 61 ff 21 f8 stdu r1,-160(r1)
93c: 78 0b 3f 7c mr r31,r1
940: 78 00 7f f8 std r3,120(r31)
944: 56 12 02 f0 xxswapd vs0,vs34
948: d0 ff 20 39 li r9,-48
Is it what you are looking for?

Look for padding to alignment boundaries as candidates for the gaps between functions.
mflr (move from link register) is often found near the start of a function, if it's used at all. (non-leaf function).
Compiler generate code often ends functions with a lot of reloading call-preserved registers from stack memory, and often has most of the saving early in a function.
Of course, the actual return points in a function might not be the last basic blocks; it's often useful to put the code for a rare condition in a block at the very end past the normal return that's branched to, and then jumps back after doing something.
A real example of compiler-generated code may be illustrative. Compiled by gcc4.8.5 on the Godbolt compiler explorer for PowerPC (32-bit) with -O3 -mregnames
int ext();
int foo() { ext(); return 1; }
foo:
mflr %r0 # save link-register value
stwu %r1,-16(%r1)
stw %r0,20(%r1) # ... to memory
bl ext
lwz %r0,20(%r1) # then restore it after a function call
li %r3,1 # return 1
addi %r1,%r1,16 # stack pointer adjustment
mtlr %r0
blr # ret
Notice that blr (branch to link-register) is used as a return instruction, like x86 ret, instead of simply doing a normal register-indirect jump to the return address in %r0. (Presumably PowerPC handles blr specially, maybe with a return address predictor stack). If the code you're analyzing uses does this, it makes finding the ends of functions much easier. (But remember that tail-duplication optimizations can give a function multiple return paths, and blr won't always be the last instruction.)
The bl target addresses will give you (most of) the function entry points. Have your disassembler put labels on all the bl targets: branch-and-link is basically a call instruction, so the target address is always a function.
Functions that end with a tail-call to another function won't use blr at the end.
If a function is only ever tail-called, there won't be any bl instructions that target it. Or if it's only used with function pointers.
Regular branch/jump instructions that jump more than a few kiB are almost certainly jumping outside the current function to tail-call another function. So you should look at jumps like that as likely candidates for function entry/exit points.

Related

C's strange pointer arithmetics [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I was working on pointers. I saw a code snippet but I couldn't understand how it works.
The strange thing is that when the k function is executed, the expression y = 2 doesn't seem to work. Because the output is y = 1 instead of y = 2.
Any idea about this?
#include<stdio.h>
void k(void){
int x;
*(&x+5) += 7;
}
void main(void){
int y = 1;
y = 1;
k();
y = 2;
printf("y = %d", y);
}
CPU:AMD Ryzen 7 5800H with Radeon Graphics (16) # 3.200GHz
OS: Arch Linux x86_64
Compiler: GCC (Version: 12.1.1)
compile command: gcc a.c -o a
As pointed out by others, this statement suffers from undefined behavior:
*(&x+5) += 7;
Memory address &x+5 is outside the bounds of variable x. Writing to that address is a very bad idea.
OP's code sample exploits certain C compiler implementation details.
Probably educational; it can be used to demonstrate how hackers can exploit missing bounds checks to change the designed behavior of a program.
It is interesting enough to investigate the actual behavior of this program.
I will assume the program has been compiled by GCC with default settings on a 64-bit Linux distro on Intel x64 processor architecture (i.e. little endian).
Pointers are 64 bits (8 bytes), int is 32 bits (4 bytes).
Variable x is located on the stack. During execution of k(), the stack looks like this:
+--------+----------------------------------------------------+ TOP OF STACK
| rsp+0 | unused; 16-byte alignment filler | <--- rsp
+--------+----------------------------------------------------+
| rsp+4 | x | <--- &x
+--------+----------------------------------------------------+
| rsp+8 | reserved; least significant 32 bits of stack guard |
+--------+----------------------------------------------------+
| rsp+12 | reserved; most significant 32 bits of stack guard |
+--------+----------------------------------------------------+
| rsp+16 | least significant 32 bits of saved base pointer | <--- rbp
+--------+----------------------------------------------------+
| rsp+20 | most significant 32 bits of saved base pointer |
+--------+----------------------------------------------------+
| rsp+24 | least significant 32 bits of return address of k() | <--- &x+5
+--------+----------------------------------------------------+
| rsp+28 | most significant 32 bits of return address of k() |
+--------+----------------------------------------------------+
| rsp+32 | start of stack frame of main() |
+--------+----------------------------------------------------+
&x+5 is a memory address that is 20 bytes away from &x (because sizeof(int) is 4).
That happens to be the location of the return address of k().
*(&x+5) += 7 will increase the return address by 7.
That will have its effect when returning from k() to main().
Here is objdump output of main():
void main(void)
{
45: 55 push rbp
46: 48 89 e5 mov rbp,rsp
49: 48 83 ec 10 sub rsp,0x10
int y = 1;
4d: c7 45 fc 01 00 00 00 mov DWORD PTR [rbp-0x4],0x1
y = 13;
54: c7 45 fc 0d 00 00 00 mov DWORD PTR [rbp-0x4],0xd
k();
5b: e8 00 00 00 00 call 60 <main+0x1b>
y = 2;
60: c7 45 fc 02 00 00 00 mov DWORD PTR [rbp-0x4],0x2
printf("y = %d\n", y);
67: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
6a: 89 c6 mov esi,eax
6c: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 73 <main+0x2e>
73: b8 00 00 00 00 mov eax,0x0
78: e8 00 00 00 00 call 7d <main+0x38>
}
7d: 90 nop
7e: c9 leave
7f: c3 ret
The program is supposed to return from k() at offset 60 (the start of y = 2), but instead, it will return at offset 67 (the start of printf("y = %d\n", y)).
In other words, after returning from k(), the program will skip y = 2 and continue from printf("y = %d\n", y), printing y = 1.
There is undefined behavior (UB) on the *(&x+5) += 7; line. When there is UB, the program can do anyting including formatting your hard drive and outputting y=1.
To learn what is happening under the hood, you can check the assembly output.
The function k() invokes undefined behaviour.
This is because first you declare a local variable x. After this you try to add 7 to the address &x+5 which goes past the end of x.

Is (really) '<' faster than '!=' in C?

This is the typical question to be DV, so I've hesitated (a lot) before posting it...
I know this question was marked as duplicate, but my tests (if they are good: are they good? this is part of the question) tends to show this is not the case.
At the beginning, I did some tests comparing a for loop to a while loop.
That shows that for loop was better.
But going further, for or while was not the point: the difference is related to:
for (int l = 0; l < loops;l++) {
or
for (int l = 0; l != loops;l++) {
And if you run that (under Windows 10, Visual studio 2017, release), you see that the first one is more than twice faster than the second one.
It is hard (for a novice as I am) to understand if the compiler for some reasons is able to optimize more one or another. But...
Short question
Why?
Longer question
The complete code is the following:
For the '<' loop:
int forloop_inf(int loops, int iterations)
{
int n = 0;
int x = n;
for (int l = 0; l < loops;l++) {
for (int i = 0; i < iterations;i++) {
n++;
x += n;
}
}
return x;
}
For the '!=' loop:
int forloop_diff(int loops, int iterations)
{
int n = 0;
int x = n;
for (int l = 0; l != loops;l++) {
for (int i = 0; i != iterations;i++) {
n++;
x += n;
}
}
return x;
}
In both cases, the inner calculation is just here in order to avoid the compiler skipping all the loops.
Respectively this is called by:
printf("for loop inf %f\n", monitor_int(loops, iterations, forloop_inf, &result));
printf("%d\n", result);
and
printf("for loop diff %f\n", monitor_int(loops, iterations, forloop_diff, &result));
printf("%d\n", result);
where loops = 10 * 1000 and iterations = 1000 * 1000.
And where monitor_int is:
double monitor_int(int loops, int iterations, int(*func)(int, int), int *result)
{
clock_t start = clock();
*result = func(loops, iterations);
clock_t stop = clock();
return (double)(stop - start) / CLOCKS_PER_SEC;
}
The result in seconds is:
for loop inf 2.227 seconds
for loop diff 4.558 seconds
So, even if the interest of all that is relative to the weight of what is done inside the loop compared to the loop itself, why such a difference?
Edit:
You can find here the full source code reviewed so that functions are called in a random order several times.
The corresponding disassembly is here (obtained with dumpbin /DISASM CPerf2.exe).
Running it, I now obtain:
'!=' 0.045231 (average on 493 runs)
'<' 0.031010 (average on 507 runs)
I do not know how to set O3 in Visual Studio, the compile command line is the following:
/permissive- /Yu"stdafx.h" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"x64\Release\vc141.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /FC /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\CPerf2.pch" /diagnostics:classic
The code for the loops is above, here is the random way to run it:
typedef int(loop_signature)(int, int);
void loops_compare()
{
int loops = 1 * 100;
int iterations = 1000 * 1000;
int result;
loop_signature *functions[2] = {
forloop_diff,
forloop_inf
};
int n_rand = 1000;
int n[2] = { 0, 0 };
double cum[2] = { 0.0, 0.0 };
for (int i = 0; i < n_rand;i++) {
int pick = rand() % 2;
loop_signature *fun = functions[pick];
double time = monitor(loops, iterations, fun, &result);
n[pick]++;
cum[pick] += time;
}
printf("'!=' %f (%d) / '<' %f (%d)\n", cum[0] / (double)n[0], n[0], cum[1] / (double)n[1], n[1]);
}
and the disassembly (the loop functions only, but not sure it is the good extract of the link above):
?forloop_inf##YAHHH#Z:
0000000140001000: 48 83 EC 08 sub rsp,8
0000000140001004: 45 33 C0 xor r8d,r8d
0000000140001007: 45 33 D2 xor r10d,r10d
000000014000100A: 44 8B DA mov r11d,edx
000000014000100D: 85 C9 test ecx,ecx
000000014000100F: 7E 6F jle 0000000140001080
0000000140001011: 48 89 1C 24 mov qword ptr [rsp],rbx
0000000140001015: 8B D9 mov ebx,ecx
0000000140001017: 66 0F 1F 84 00 00 nop word ptr [rax+rax]
00 00 00
0000000140001020: 45 33 C9 xor r9d,r9d
0000000140001023: 33 D2 xor edx,edx
0000000140001025: 33 C0 xor eax,eax
0000000140001027: 41 83 FB 02 cmp r11d,2
000000014000102B: 7C 29 jl 0000000140001056
000000014000102D: 41 8D 43 FE lea eax,[r11-2]
0000000140001031: D1 E8 shr eax,1
0000000140001033: FF C0 inc eax
0000000140001035: 8B C8 mov ecx,eax
0000000140001037: 03 C0 add eax,eax
0000000140001039: 0F 1F 80 00 00 00 nop dword ptr [rax]
00
0000000140001040: 41 FF C1 inc r9d
0000000140001043: 83 C2 02 add edx,2
0000000140001046: 45 03 C8 add r9d,r8d
0000000140001049: 41 03 D0 add edx,r8d
000000014000104C: 41 83 C0 02 add r8d,2
0000000140001050: 48 83 E9 01 sub rcx,1
0000000140001054: 75 EA jne 0000000140001040
0000000140001056: 41 3B C3 cmp eax,r11d
0000000140001059: 7D 06 jge 0000000140001061
000000014000105B: 41 FF C2 inc r10d
000000014000105E: 45 03 D0 add r10d,r8d
0000000140001061: 42 8D 0C 0A lea ecx,[rdx+r9]
0000000140001065: 44 03 D1 add r10d,ecx
0000000140001068: 41 8D 48 01 lea ecx,[r8+1]
000000014000106C: 41 3B C3 cmp eax,r11d
000000014000106F: 41 0F 4D C8 cmovge ecx,r8d
0000000140001073: 44 8B C1 mov r8d,ecx
0000000140001076: 48 83 EB 01 sub rbx,1
000000014000107A: 75 A4 jne 0000000140001020
000000014000107C: 48 8B 1C 24 mov rbx,qword ptr [rsp]
0000000140001080: 41 8B C2 mov eax,r10d
0000000140001083: 48 83 C4 08 add rsp,8
0000000140001087: C3 ret
0000000140001088: CC CC CC CC CC CC CC CC ÌÌÌÌÌÌÌÌ
?forloop_diff##YAHHH#Z:
0000000140001090: 45 33 C0 xor r8d,r8d
0000000140001093: 41 8B C0 mov eax,r8d
0000000140001096: 85 C9 test ecx,ecx
0000000140001098: 74 28 je 00000001400010C2
000000014000109A: 44 8B C9 mov r9d,ecx
000000014000109D: 0F 1F 00 nop dword ptr [rax]
00000001400010A0: 85 D2 test edx,edx
00000001400010A2: 74 18 je 00000001400010BC
00000001400010A4: 8B CA mov ecx,edx
00000001400010A6: 66 66 0F 1F 84 00 nop word ptr [rax+rax]
00 00 00 00
00000001400010B0: 41 FF C0 inc r8d
00000001400010B3: 41 03 C0 add eax,r8d
00000001400010B6: 48 83 E9 01 sub rcx,1
00000001400010BA: 75 F4 jne 00000001400010B0
00000001400010BC: 49 83 E9 01 sub r9,1
00000001400010C0: 75 DE jne 00000001400010A0
00000001400010C2: C3 ret
00000001400010C3: CC CC CC CC CC CC CC CC CC CC CC CC CC ÌÌÌÌÌÌÌÌÌÌÌÌÌ
Edit again:
What I feel surprising is also the following:
In debug more the performance is the same (and the assembly code too)
So how to be confident about what you're coding if such differences appear after that? (considering I've done no mistake somewhere)
For proper benchmarking, it's important to run the functions in random order and many times.
typedef int(signature)(int, int);
...
int main() {
int loops, iterations, runs;
fprintf(stderr, "Loops: ");
scanf("%d", &loops);
fprintf(stderr, "Iterations: ");
scanf("%d", &iterations);
fprintf(stderr, "Runs: ");
scanf("%d", &runs);
fprintf(stderr, "Running for %d loops and %d iterations %d times.\n", loops, iterations, runs);
signature *functions[2] = {
forloop_inf,
forloop_diff
};
int result = functions[0](loops, iterations);
for( int i = 0; i < runs; i++ ) {
int pick = rand() % 2;
signature *function = functions[pick];
int new_result;
printf("%d %f\n", pick, monitor_int(loops, iterations, function, &new_result));
if( result != new_result ) {
fprintf(stderr, "got %d expected %d\n", new_result, result);
}
}
}
Armed with this we can do 1000 runs in random order and find the average times.
It's also important to benchmark with optimizations turned on. Not much point in asking how fast unoptimized code will run. I'll try at -O2 and -O3.
My findings are that with Apple LLVM version 8.0.0 (clang-800.0.42.1) doing 10000 loops and 1000000 iterations at -O2 forloop_inf is indeed about 50% faster than forloop_diff.
forloop_inf: 0.000009
forloop_diff: 0.000014
Looking at the generated assembly code for -O2 with clang -O2 -S -mllvm --x86-asm-syntax=intel test.c I can see many differences between the two implementations. Maybe somebody who knows assembly can tell us why.
But at -O3 the performance difference is no longer discernible.
forloop_inf: 0.000002
forloop_diff: 0.000002
This is because at -O3 they are almost exactly the same. One is using je and one is using jle. That's it.
In conclusion, when benchmarking...
Do many runs.
Randomize the order.
Compile and run as close as you can do how you would in production.
In this case that means turning on compiler optimizations.
Look at the assembly code.
And most of all.
Pick the safest code, not the fastest.
i < max is safer than i != max because it will still terminate if i somehow jumps over max.
As demonstrated, with optimizations turned on, they're both so fast that even not fully optimized they can whip through 10,000,000,000 iterations in 0.000009 seconds. i < max or i != max is unlikely to be a performance bottleneck, rather whatever you're doing 10 billion times is.
But i != max could lead to a bug.
"<" is not faster than '!='. What's happening is something entirely different.
A loop "for (i = 0; i < n; ++i) is a pattern that the compiler recognises. If the loop body has no instructions modifying i or n, then the compiler knows this is a loop executing exactly max (n - i, 0) times, and can produce optimal code for this.
A loop "for (i = 0; i != n; ++i) is used in practice much less often, so compiler writers are not too bothered with it. And the number of iterations is much harder to determine. If i > n then we have undefined behaviour for signed integers unless there are statements that exit the loop. For unsigned numbers the number of iterations is tricky because it depends on the type of i. You will just get less optimised code.
Always look at the generated code.
It used to be the truth many years ago when some μP did not have some conditional branch instructions or very few flags. So, some of the conditions had to be compiled to set of comparisons and jumps.
But it is not the truth anymore as the modern processors have very rich conditional branching instructions (some of them have many "regular" conditional instructions as well – for example ARM ones) and a lots of the flags.
You can play yourself with different conditions here: https://godbolt.org/g/9DsqJm

i386-elf-gcc out put strange assembler command about "static a = 0"

i am write a mini os. And when i write this code to show time clock, its goes wrong
7 void timer_callback(pt_regs *regs)
8 {
9 static uint32_t tick = 0;
10 printf("Tick: %dtimes\n", tick);
11 tick++;
12 }
tick is initialise not with 0, but 1818389861. but if tick init with 0x01 or anything else zero, it's ok!!!
so i wirte a simple c file then objdump:
staic.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
extern void printf(char *, int);
int main(){
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 04 sub $0x4,%esp
static int a = 1;
printf("%d\n", a);
11: a1 00 00 00 00 mov 0x0,%eax
16: 83 ec 08 sub $0x8,%esp
19: 50 push %eax
1a: 68 00 00 00 00 push $0x0
1f: e8 fc ff ff ff call 20 <main+0x20>
24: 83 c4 10 add $0x10,%esp
return 0;
27: b8 00 00 00 00 mov $0x0,%eax
}
2c: 8b 4d fc mov -0x4(%ebp),%ecx
2f: c9 leave
30: 8d 61 fc lea -0x4(%ecx),%esp
33: c3 ret
so strange, no memory used!!!
Update: let me say it clearly
the second static.c is an experiment, it was thought it show no memory used, but i was wrong, mov 0x0 %eab is. i confuse 0x0 and $0x0 /..\
my origin problem is why tick not succeed init with 0.(but can init with 1 or anyelsenumber).
i look up it again use gdb, ok, it do use memory like mov
eax,ds:0x106010,but the real strong thing is the memory x 0x106010 is not 0,but it should be, just as i said, if i let tick = 1 or anythingelse, memory do init as i want, that is the strange thing!
the tool: gdb ,objdump return different asm(different means,not formate),because, just learn os,not good at c, so i let it go,ignore it....
Memory is used, be sure of that; however, you won't find that memory in the .text section. Memory for static variables is allocated in either .bss (when zero-initialized; or, in case of C++, dynamically initialized) or .data (when non-zero initialized) section.
When dumping object files with objdump using the -d (disassembly) option, it is important to also use the -r (relocations) option. Without that, the disassembly you get is deceiving and makes little sense.
In your case, the instruction at addresses 11 and 1f must have relocations, at address 11, to the variable a and at address 1f, to the function printf. The instruction at address 11 loads the value from your variable a, without proper relocations it looks as if it loaded a value from address 0.
As to your original question, the value you get, 1818389861, or 0x6C626D65, is quite remarkable. I would bet that somewhere in your program you have a buffer overrun involving a string containing the subsequence embl.
As a side note, I would like to call your attention to the use of correct type specifications in printf calls. The type specification %d corresponds to the type int; on all modern mainstream architectures, int and int32_t are of the same size. However, that is not guaranteed to always be so. There are special type specifications for use with explicitly-sized types, for example, for an int32_t you use "PRId32":
uint32_t x;
printf("%"PRId32, x);

What numeric values defines in dissembled of C code?

I'm understanding the assembly and C code.
I have following C program , compiled to generate Object file only.
#include <stdio.h>
int main()
{
int i = 10;
int j = 22 + i;
return 0;
}
I executed following command
objdump -S myprogram.o
Output of above command is:
objdump -S testelf.o
testelf.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main()
{
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
int i = 10;
6: c7 45 f8 0a 00 00 00 movl $0xa,-0x8(%ebp)
int j = 22 + i;
d: 8b 45 f8 mov -0x8(%ebp),%eax
10: 83 c0 16 add $0x16,%eax
13: 89 45 fc mov %eax,-0x4(%ebp)
return 0;
16: b8 00 00 00 00 mov $0x0,%eax
}
1b: c9 leave
1c: c3 ret
What is meant by number numeric before the mnemonic commands
i.e. "83 ec 10 " before "sub" command or
"c7 45 f8 0a 00 00 00" before "movl" command
I'm using following platform to compile this code:
$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Vendor ID: GenuineIntel
Those are x86 opcodes. A detailed reference, other than the ones listed in the comments above is available here.
For example the c7 45 f8 0a 00 00 00 before the movl $0xa,-0x8(%ebp) are hexadecimal values for the opcode bytes. They tell the CPU to move the immediate value of 10 decimal (as a 4-byte value) into the address located on the current stack 8-bytes above the stack frame base pointer. That is where the variable i from your C source code is located when your code is running. The top of the stack is at a lower memory address than the bottom of the stack, so moving a negative direction from the base is moving up the stack.
The c7 45 f8 opcodes mean to mov data and clear the arithmetic carry flag in the EFLAGS register. See the reference for more detail.
The remainder of the codes are an immediate value. Since you are using a little endian system, the least significant byte of a number is listed first, such that 10 decimal which is 0x0a in hexadecimal and has a 4-byte value of 0x0000000a is stored as 0a 00 00 00.

divdi3 division used for long long by gcc on x86

When gcc sees multiplication or division of integer types that isn't supported in hardware, it generates call to special library function.
http://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html#Integer-library-routines
According link above, long __divdi3 (long a, long b) used for division of long. However, here http://gcc.gnu.org/onlinedocs/gcc-3.3/gccint/Library-Calls.html divdi explained as "call for division of one signed double-word". When first source has cleary mapping of di suffix -> long arguments, second states divdi for double-word and udivdi for full-word (single, right?)
When I compile simple example
int main(int argc, char *argv[]) {
long long t1, t2, tr;
t1 = 1;
t2 = 1;
tr = t1 / t2;
return tr;
}
with gcc -Wall -O0 -m32 -march=i386 (gcc ver. 4.7.2)
dissamble shows me
080483cc <main>:
80483cc: 55 push %ebp
80483cd: 89 e5 mov %esp,%ebp
80483cf: 83 e4 f0 and $0xfffffff0,%esp
80483d2: 83 ec 30 sub $0x30,%esp
80483d5: c7 44 24 28 01 00 00 movl $0x1,0x28(%esp)
80483dc: 00
80483dd: c7 44 24 2c 00 00 00 movl $0x0,0x2c(%esp)
80483e4: 00
80483e5: c7 44 24 20 01 00 00 movl $0x1,0x20(%esp)
80483ec: 00
80483ed: c7 44 24 24 00 00 00 movl $0x0,0x24(%esp)
80483f4: 00
80483f5: 8b 44 24 20 mov 0x20(%esp),%eax
80483f9: 8b 54 24 24 mov 0x24(%esp),%edx
80483fd: 89 44 24 08 mov %eax,0x8(%esp)
8048401: 89 54 24 0c mov %edx,0xc(%esp)
8048405: 8b 44 24 28 mov 0x28(%esp),%eax
8048409: 8b 54 24 2c mov 0x2c(%esp),%edx
804840d: 89 04 24 mov %eax,(%esp)
8048410: 89 54 24 04 mov %edx,0x4(%esp)
8048414: e8 17 00 00 00 call 8048430 <__divdi3>
8048419: 89 44 24 18 mov %eax,0x18(%esp)
804841d: 89 54 24 1c mov %edx,0x1c(%esp)
8048421: 8b 44 24 18 mov 0x18(%esp),%eax
8048425: c9 leave
8048426: c3 ret
Note 8048414: call 8048430 <__divdi3>.
I can't use gcc lib for my project and it's multiplatform. I hoped not to write all __* functions for all platforms (speed is not matter), but now I'm a bit confused.
Can somebody explain, why is there __divdi3 (not __divti3) call generated for long long int (64-bit) division?
On x86 machines, the term "word" usually implies presence of a 16-bit value. More generally in the computer-science world, word can denote values of virtually arbitrary lengths, with words of 10 or 12 bits not being uncommon in the embedded systems.
I believe that the terminology you have hit upon is used for the Linux/Unix systems just for the sake of unification on the level of the operating system and has nothing to do with the target platform of your build. An example of use of the same notation can be found in gdb, which uses w for the 32-bit word and hw for the 16-bit "half-word" (in the x86 sense).
Furthermore, this convention also extends to the standard IEEE-754 floating point numbers with ease, and is summarised in the few bullet points below
s - single (precision, word) is used for four byte integers (int) / floats (float)
d - double (precision) for eight byte integers (long or long long) / floats (double)
t - ten bytes for integers (long long) / floats (long double)
This naming convention is used for all arithmetic built-ins, like __divsi3, __divdi3, __divti3 or __mulsi3, __muldi3, __multi3... (and all u - unsigned - variants). A complete list can be found here.
Division of 64-bit numbers on 32-bit machines uses advanced (and bit difficult) algorithm. However, you can still use algorithm principle you've learned in school. Here's simple pseudo-code for it (have a look on this answer about big-integers):
result = 0;
count = 0;
remainder = numerator;
while(highest_bit_of_divisor_not_set) {
divisor = divisor << 1;
count++;
}
while(remainder != 0) {
if(remainder >= divisor) {
remainder = remainder - divisor;
result = result | (1 << count);
}
if(count == 0) {
break;
}
divisor = divisor >> 1;
count--;
}

Resources