Related
I'm trying to understand assembly in x86 more. I have a mystery function here that I know returns an int and takes an int argument.
So it looks like int mystery(int n){}. I can't figure out the function in C however. The assembly is:
mov %edi, %eax
lea 0x0(,%rdi, 8), %edi
sub %eax, %edi
add $0x4, %edi
callq < mystery _util >
repz retq
< mystery _util >
mov %edi, %eax
shr %eax
and $0x1, %edi
and %edi, %eax
retq
I don't understand what the lea does here and what kind of function it could be.
The assembly code appeared to be computer generated, and something that was probably compiled by GCC since there is a repz retq after an unconditional branch (call). There is also an indication that because there isn't a tail call (jmp) instead of a call when going to mystery_util that the code was compiled with -O1 (higher optimization levels would likely inline the function which didn't happen here). The lack of frame pointers and extra load/stores indicated that it isn't compiled with -O0
Multiplying x by 7 is the same as multiplying x by 8 and subtracting x. That is what the following code is doing:
lea 0x0(,%rdi, 8), %edi
sub %eax, %edi
LEA can compute addresses but it can be used for simple arithmetic as well. The syntax for a memory operand is displacement(base, index, scale). Scale can be 1, 2, 4, 8. The computation is displacement + base + index * scale. In your case lea 0x0(,%rdi, 8), %edi is effectively EDI = 0x0 + RDI * 8 or EDI = RDI * 8. The full calculation is n * 7 - 4;
The calculation for mystery_util appears to simply be
n &= (n>>1) & 1;
If I take all these factors together we have a function mystery that passes n * 7 - 4 to a function called mystery_util that returns n &= (n>>1) & 1.
Since mystery_util returns a single bit value (0 or 1) it is reasonable that bool is the return type.
I was curious if I could get a particular version of GCC with optimization level 1 (-O1) to reproduce this assembly code. I discovered that GCC 4.9.x will yield this exact assembly code for this given C program:
#include<stdbool.h>
bool mystery_util(unsigned int n)
{
n &= (n>>1) & 1;
return n;
}
bool mystery(unsigned int n)
{
return mystery_util (7*n+4);
}
The assembly output is:
mystery_util:
movl %edi, %eax
shrl %eax
andl $1, %edi
andl %edi, %eax
ret
mystery:
movl %edi, %eax
leal 0(,%rdi,8), %edi
subl %eax, %edi
addl $4, %edi
call mystery_util
rep ret
You can play with this code on godbolt.
Important Update - Version without bool
I apparently erred in interpreting the question. I assumed the person asking this question determined by themselves that the prototype for mystery was int mystery(int n). I thought I could change that. According to a related question asked on Stackoverflow a day later, it seems int mystery(int n) is given to you as the prototype as part of the assignment. This is important because it means that a modification has to be made.
The change that needs to be made is related to mystery_util. In the code to be reverse engineered are these lines:
mov %edi, %eax
shr %eax
EDI is the first parameter. SHR is logical shift right. Compilers would only generate this if EDI was an unsigned int (or equivalent). int is a signed type an would generate SAR (arithmetic shift right). This means that the parameter for mystery_util has to be unsigned int (and it follows that the return value is likely unsigned int. That means the code would look like this:
unsigned int mystery_util(unsigned int n)
{
n &= (n>>1) & 1;
return n;
}
int mystery(int n)
{
return mystery_util (7*n+4);
}
mystery now has the prototype given by your professor (bool is removed) and we use unsigned int for the parameter and return type of mystery_util. In order to generate this code with GCC 4.9.x I found you need to use -O1 -fno-inline. This code can be found on godbolt. The assembly output is the same as the version using bool.
If you use unsigned int mystery_util(int n) you would discover that it doesn't quite output what we want:
mystery_util:
movl %edi, %eax
sarl %eax ; <------- SAR (arithmetic shift right) is not SHR
andl $1, %edi
andl %edi, %eax
ret
The LEA is just a left-shift by 3, and truncating the result to 32 bit (i.e. zero-extending EDI into RDI implicilty). x86-64 System V passes the first integer arg in RDI, so all of this is consistent with one int arg. LEA uses memory-operand syntax and machine encoding, but it's really just a shift-and-add instruction. Using it as part of a multiply by a constant is a common compiler optimization for x86.
The compiler that generated this function missed an optimization here; the first mov could have been avoided with
lea 0x0(,%rdi, 8), %eax # n << 3 = n*8
sub %edi, %eax # eax = n*7
lea 4(%rax), %edi # rdi = 4 + n*7
But instead, the compiler got stuck on generating n*7 in %edi, probably because it applied a peephole optimization for the constant multiply too late to redo register allocation.
mystery_util returns the bitwise AND of the low 2 bits of its arg, in the low bit, so a 0 or 1 integer value, which could also be a bool.
(shr with no count means a count of 1; remember that x86 has a special opcode for shifts with an implicit count of 1. 8086 only has counts of 1 or cl; immediate counts were added later as an extension and the implicit-form opcode is still shorter.)
The LEA performs an address computation, but instead of dereferencing the address, it stores the computed address into the destination register.
In AT&T syntax, lea C(b,c,d), reg means reg = C + b + c*d where C is a constant, and b,c are registers and d is a scalar from {1,2,4,8}. Hence you can see why LEA is popular for simple math operations: it does quite a bit in a single instruction. (*includes correction from prl's comment below)
There are some strange features of this assembly code: the repz prefix is only strictly defined when applied to certain instructions, and retq is not one of them (though the general behavior of the processor is to ignore it). See Michael Petch's comment below with a link for more info. The use of lea (,rdi,8), edi followed by sub eax, edi to compute arg1 * 7 also seemed strange, but makes sense once prl noted the scalar d had to be a constant power of 2. In any case, here's how I read the snippet:
mov %edi, %eax ; eax = arg1
lea 0x0(,%rdi, 8), %edi ; edi = arg1 * 8
sub %eax, %edi ; edi = (arg1 * 8) - arg1 = arg1 * 7
add $0x4, %edi ; edi = (arg1 * 7) + 4
callq < mystery _util > ; call mystery_util(arg1 * 7 + 4)
repz retq ; repz prefix on return is de facto nop.
< mystery _util >
mov %edi, %eax ; eax = arg1
shr %eax ; eax = arg1 >> 1
and $0x1, %edi ; edi = 1 iff arg1 was odd, else 0
and %edi, %eax ; eax = 1 iff smallest 2 bits of arg1 were both 1.
retq
Note the +4 on the 4th line is entirely spurious. It cannot affect the outcome of mystery_util.
So, overall this ASM snippet computes the boolean (arg1 * 7) % 4 == 3.
I am originally given the function prototype:
void decode1(int *xp, int *yp, int *zp)
now i am told to convert the following assembly into C code:
movl 8(%ebp), %edi //line 1 ;; gets xp
movl 12(%ebp), %edx //line 2 ;; gets yp
movl 16(%ebp),%ecx //line 3 ;; gets zp
movl (%edx), %ebx //line 4 ;; gets y
movl (%ecx), %esi //line 5 ;; gets z
movl (%edi), %eax //line 6 ;; gets x
movl %eax, (%edx) //line 7 ;; stores x into yp
movl %ebx, (%ecx) //line 8 ;; stores y into zp
movl %esi, (%edi) //line 9 ;; stores z into xp
These comments were not given to me in the problem this is what I believe they are doing but am not 100% sure.
My question is, for lines 4-6, am I able to assume that the command
movl (%edx), %ebx
movl (%ecx), %esi
movl (%edi), %eax
just creates a local variables to y,z,x?
also, do the registers that each variable get stored in i.e (edi,edx,ecx) matter or can I use any register in any order to take the pointers off of the stack?
C code:
int tx = *xp;
int ty = *yp;
int tz = *zp;
*yp = tx;
*zp = ty;
*xp = tz;
If I wasn't given the function prototype how would I tell what type of return type is used?
Let's focus on a simpler set of instructions.
First:
movl 8(%ebp), %edi
will load into the EDI register the content of the 4 bytes that are situated on memory at 8 eight bytes beyond the address set in the EBP register. This special EBP usage is a convention followed by the compiler code generator, that per each function, saves the stack pointer ESP into the EBP registers, and then creates a stack frame for the function local variables.
Now, in the EDI register, we have the first parameter passed to the function, that is a pointer to an integer, so EDI contains now the address of that integer, but not the integer itself.
movl (%edi), %eax
will get the 4 bytes pointed by the EDI register and load them into the EAX register.
Now in EAX we have the value of the integer pointed by the xp in the first parameter.
And then:
movl %eax, (%edx)
will save this integer value into the memory pointed by the content of the EDX register which was in turn loaded from EBP+12 which is the second parameter passed to the function.
So, your first question, is this assembly code equivalent to this?
int tx = *xp;
int ty = *yp;
int tz = *zp;
*yp = tx;
*zp = ty;
*xp = tz;
is, yes, but note that there are no tx,ty,tz local variables created, but just processor registers.
And your second question, is no, you can't tell the type of return, it is, again, a convention on the register usage that you can't infer just by looking at the generated assembly code.
Congratulations, you got everything right :)
You can use any register but some need to be preserved, that is they should be saved before use and restored afterwards. In typical calling conventions you can use eax, ecx and edx, the rest need to be preserved. The assembly you showed doesn't include code to do this, but presumably it is there.
As for the return type, that's hard to deduce. Simple types are returned in the eax register, and something is always in there. We can't tell if that's intended as a return value, or just remains of a local variable. That is, if your function had return tx; it could be the same assembly code. Also, we don't know the type for eax either, it could be anything that fits in there and is expected to be returned there according to the calling convention.
I am trying to interpret the following IA32 assembler code and write a function in C that will have an equivalent effect.
Let's say that parameters a, b and c are stored at memory locations with offsets 8, 12 and 16 relative to the address in register %ebp, and that an appropriate function prototype in C would be equivFunction(int a, int b, int c);
movl 12(%ebp), %edx // store b into %edx
subl 16(%ebp), %edx // %edx = b - c
movl %edx, %eax // store b - c into %eax
sall $31, %eax // multiply (b-c) * 2^31
sarl $31, %eax // divide ((b-c)*2^31)) / 2^31
imull 8(%ebp), %edx // multiply a * (b - c) into %edx
xorl %edx, %eax // exclusive or? %edx or %eax ? what is going on here?
First, did I interpret the assembly correctly? If so, how would I go about translating this into C?
The sall/sarl combo has the effect of setting all bits of eax to the value of the zeroth bit. First, sal moves the 0th bit to the 31st position, making it a sign bit. Then sar moves it back, filling the rest of the register with its copy. Don't think of it as division/multiplication - think of it as bitwise shift, which "s" actually stands for.
So eax is 0xffffffff (-1) if b-c is odd, 0 if even. So the imull command places into edx either a negative of a, or zero. The final xor, then, either inverts the all bits of a (that's what xor with one does) or leaves the zero value be.
This whole snippet has an air of artificiality. Is this homework?
The shifts manipulate the sign bit directly, rather than multiplying/dividing, so the code is roughly
int eqivFunction(int a, int b, int c) {
int t1 = b - c;
unsigned t2 = t1 < 0 ? ~0U : 0;
return (a * t1) ^ t2;
}
Alternately:
int eqivFunction(int a, int b, int c) {
int t1 = b - c;
int t2 = a * t1;
if (t1 < 0) t2 = -t2 - 1;
return t2;
}
Of course, the C code has undefined behavior on integer overflow, while the assembly code is well-defined, so the C code might not do the same thing in all cases (particularly if you compile it on a different architecture)
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I want to do the following arithmetic functions in a C pre-processor include statement when I send in the variable x.
#define calc_addr_data_reg (x) ( base_offset + ((x/7) * 0x20) + data_reg_offset)
How would I go about implementing the division and multiplication operations using bitshifts? In the division operation I only need the the quotient.
To answer the questions,
"Is this expression correct in the C Preprocessor?"
I don't see anything wrong with it.
How would I go about implementing the division and multiplication operations using bitshifts? In the division operation I only need the the quotient.
The compiler is going to do a better job of optimizing your code than you will in almost all cases. If you have to ask StackOverflow how to do this, then you don't know enough to outperform GCC. I know I certainly don't. But because you asked here's how gcc optimizes it.
#EdHeal,
This needed a little bit more room to respond properly. You're absolutely correct in the example you gave (getters and setters), but in this particular example, inlineing the function would slightly increase side of the binary, assuming that it's called a few times.
GCC compiles the function to:
mov ecx, edx
mov edx, -1840700269
mov eax, edi
imul edx
lea eax, [rdx+rdi]
sar eax, 2
sar edi, 31
sub eax, edi
sal eax, 5
add esi, eax
lea eax, [rsi+rcx]
ret
Which is more bytes than the assembly for calling and getting a return value from the function, which is 3 push statements, a call, a return, and a pop statement (presumably).
with -Os it compiles into:
mov eax, edi
mov ecx, 7
mov edi, edx
cdq
idiv ecx
sal eax, 5
add eax, esi
add eax, edi
ret
Which is less bytes than the call return push and pops.
So in this case it really matters what compiler flags he uses whether or not the code is smaller or larger when inlining.
To Op again:
Explaining what the code up there means:
The next part of this post is ripped directly from: http://porn.quiteajolt.com/2008/04/30/the-voodoo-of-gcc-part-i/
The proper reaction to this monstrosity is “wait what.” Some specific instructions that I think could use more explanation:
movl $-1840700269, -4(%ebp)
-1840700269 = -015555555555 in octal (indicated by the leading zero). I’ll be using the octal representation because it looks cooler.
imull %ecx
This multiplies %ecx and %eax. Both of these registers contain a 32-bit number, so this multiplication could possibly result in a 64-bit number. This can’t fit into one 32-bit register, so the result is split across two: the high 32 bits of the product get put into %edx, and the low 32 get put into %eax.
leal (%edx,%ecx), %eax
This adds %edx and %ecx and puts the result into %eax. lea‘s ostensible purpose is for address calculations, and it would be more clear to write this as two instructions: an add and a mov, but that would take two clock cycles to execute, whereas this takes just one.
Also note that this instruction uses the high 32 bits of the multiplication from the previous instruction (stored in %edx) and then overwrites the low 32 bits in %eax, so only the high bits from the multiplication are ever used.
sarl $2, %edx # %edx = %edx >> 2
Technically, whether or not sar (arithmetic right shift) is equivalent to the >> operator is implementation-defined. gcc guarantees that the operator is an arithmetic shift for signed numbers (“Signed `>>’ acts on negative numbers by sign extension”), and since I’ve already used gcc once, let’s just assume I’m using it for the rest of this post (because I am).
sarl $31, %eax
%eax is a 32-bit register, so it’ll be operating on integers in the range [-231, 231 - 1]. This produces something interesting: this calculation only has two possible results. If the number is greater than or equal to 0, the shift will reduce the number to 0 no matter what. If the number is less than 0, the result will be -1.
Here’s a pretty direct rewrite of this assembly back into C, with some integer-width paranoia just to be on the safe side, since a few of these steps are dependent on integers being exactly 32 bits wide:
int32_t divideBySeven(int32_t num) {
int32_t eax, ecx, edx, temp; // push %ebp / movl %esp, %ebp / subl $4, %esp
ecx = num; // movl 8(%ebp), %ecx
temp = -015555555555; // movl $-1840700269, -4(%ebp)
eax = temp; // movl -4(%ebp), %eax
// imull %ecx - int64_t casts to avoid overflow
edx = ((int64_t)ecx * eax) >> 32; // high 32 bits
eax = (int64_t)ecx * eax; // low 32 bits
eax = edx + ecx; // leal (%edx,%ecx), %eax
edx = eax; // movl %eax, %edx
edx = edx >> 2; // sarl $2, %edx
eax = ecx; // movl %ecx, %eax
eax = eax >> 31; // sarl $31, %eax
ecx = edx; // movl %edx, %ecx
ecx = ecx - eax; // subl %eax, %ecx
eax = ecx; // movl %ecx, %eax
return eax; // leave / ret
}
Now there’s clearly a whole bunch of inefficient stuff here: unnecessary local variables, a bunch of unnecessary variable swapping, and eax = (int64_t)ecx * eax1; is not needed at all (I just included it for completion’s sake). So let’s clean that up a bit. This next listing just has the most of the cruft eliminated, with the corresponding assembly above each block:
int32_t divideBySeven(int32_t num) {
// pushl %ebp
// movl %esp, %ebp
// subl $4, %esp
// movl 8(%ebp), %ecx
// movl $-1840700269, -4(%ebp)
// movl -4(%ebp), %eax
int32_t eax, edx;
eax = -015555555555;
// imull %ecx
edx = ((int64_t)num * eax) >> 32;
// leal (%edx,%ecx), %eax
// movl %eax, %edx
// sarl $2, %edx
edx = edx + num;
edx = edx >> 2;
// movl %ecx, %eax
// sarl $31, %eax
eax = num >> 31;
// movl %edx, %ecx
// subl %eax, %ecx
// movl %ecx, %eax
// leave
// ret
eax = edx - eax;
return eax;
}
And the final version:
int32_t divideBySeven(int32_t num) {
int32_t temp = ((int64_t)num * -015555555555) >> 32;
temp = (temp + num) >> 2;
return (temp - (num >> 31));
}
I still have yet to answer the obvious question, “why would they do that?” And the answer is, of course, speed. The integer division instruction used in the very first listing, idiv, takes a whopping 43 clock cycles to execute. But the divisionless method that gcc produces has quite a few more instructions, so is it really faster overall? This is why we have the benchmark.
int main(int argc, char *argv[]) {
int i = INT_MIN;
do {
divideBySeven(i);
i++;
} while (i != INT_MIN);
return 0;
}
Loop over every single possible integer? Sure! I ran the test five times for both implementations and timed it with time. The user CPU times for gcc were 45.9, 45.89, 45.9, 45.99, and 46.11 seconds, while the times for my assembly using the idiv instruction were 62.34, 62.32, 62.44, 62.3, and 62.29 seconds, meaning the naive implementation ran about 36% slower on average. Yeow.
Compiler optimizations are a beautiful thing.
Ok, I'm back, now why does this work?
int32_t divideBySeven(int32_t num) {
int32_t temp = ((int64_t)num * -015555555555) >> 32;
temp = (temp + num) >> 2;
return (temp - (num >> 31));
}
Let's take a look at the first part:
int32_t temp = ((int64_t)num * -015555555555) >> 32;
Why this number?
Well, let's take 2^64 and divide it by 7 and see what pops out.
2^64 / 7 = 2635249153387078802.28571428571428571429
That looks like a mess, what if we convert it into octal?
0222222222222222222222.22222222222222222222222
That's a very pretty repeating pattern, surely that can't be a coincidence. I mean we remember that 7 is 0b111 and we know that when we divide by 99 we tend to get repeating patterns in base 10. So it makes sense that we'd get a repeating pattern in base 8 when we divide by 7.
So where does our number come in?
(int32_t)-1840700269 is the same as (uint_32t)2454267027
* 7 = 17179869189
And finally 17179869184 is 2^34
Which means that 17179869189 is the closest multiple of 7 2^34. Or to put it another way 2454267027 is the largest number that will fit in a uint32_t which when multiplied by 7 is very close to a power of 2
What's this number in octal?
0222222222223
Why is this important? Well, we want to divide by 7. This number is 2^34/7... approximately. So if we multiply by it, and then leftshift 34 times, we should get a number very close to the exact number.
The last two lines look like they were designed to patch up approximation errors.
Perhaps someone with a little more knowledge and/or expertise in this field can chime in on this.
>>> magic = 2454267027
>>> def div7(a):
... if (int(magic * a >> 34) != a // 7):
... return 0
... return 1
...
>>> for a in xrange(2**31, 2**32):
... if (not div7(a)):
... print "%s fails" % a
...
Failures begin at 3435973841 which is, funnily enough 0b11001100110011001100110011010001
I am trying to understand how calculations involving numbers greater than 232 happen on a 32 bit machine.
C code
$ cat size.c
#include<stdio.h>
#include<math.h>
int main() {
printf ("max unsigned long long = %llu\n",
(unsigned long long)(pow(2, 64) - 1));
}
$
gcc output
$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$
Corresponding assembly code
$ gcc -S size.c -O3
$ cat size.s
.file "size.c"
.section .rodata.str1.4,"aMS",#progbits,1
.align 4
.LC0:
.string "max unsigned long long = %llu\n"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $-1, 8(%esp) #1
movl $-1, 12(%esp) #2
movl $.LC0, 4(%esp) #3
movl $1, (%esp) #4
call __printf_chk
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
$
What exactly happens on the lines 1 - 4?
Is this some kind of string concatenation at the assembly level?
__printf_chk is a wrapper around printf which checks for stack overflow, and takes an additional first parameter, a flag (e.g. see here.)
pow(2, 64) - 1 has been optimised to 0xffffffffffffffff as the arguments are constants.
As per the usual calling conventions, the first argument to __printf_chk() (int flag) is a 32-bit value on the stack (at %esp at the time of the call instruction). The next argument, const char * format, is a 32-bit pointer (the next 32-bit word on the stack, i.e. at %esp+4). And the 64-bit quantity that is being printed occupies the next two 32-bit words (at %esp+8 and %esp+12):
pushl %ebp ; prologue
movl %esp, %ebp ; prologue
andl $-16, %esp ; align stack pointer
subl $16, %esp ; reserve bytes for stack frame
movl $-1, 8(%esp) #1 ; store low half of 64-bit argument (a constant) to stack
movl $-1, 12(%esp) #2 ; store high half of 64-bit argument (a constant) to stack
movl $.LC0, 4(%esp) #3 ; store address of format string to stack
movl $1, (%esp) #4 ; store "flag" argument to __printf_chk to stack
call __printf_chk ; call routine
leave ; epilogue
ret ; epilogue
The compiler has effectively rewritten this:
printf("max unsigned long long = %llu\n", (unsigned long long)(pow(2, 64) - 1));
...into this:
__printf_chk(1, "max unsigned long long = %llu\n", 0xffffffffffffffffULL);
...and, at runtime, the stack layout for the call looks like this (showing the stack as 32-bit words, with addresses increasing from the bottom of the diagram upwards):
: :
: Stack :
: :
+-----------------+
%esp+12 | 0xffffffff | \
+-----------------+ } <-------------------------------------.
%esp+8 | 0xffffffff | / |
+-----------------+ |
%esp+4 |address of string| <---------------. |
+-----------------+ | |
%esp | 1 | <--. | |
+-----------------+ | | |
__printf_chk(1, "max unsigned long long = %llu\n", |
0xffffffffffffffffULL);
similar to the way as we handle numbers greater than 9, with only digits 0 - 9.
(using positional digits). presuming the question is a conceptual one.
In your case, the compiler knows that 2^64-1 is just 0xffffffffffffffff, so it has pushed -1 (low dword) and -1 (high dword) onto the stack as your argument to printf. It's just an optimization.
In general, 64-bit numbers (and even greater values) can be stored with multiple words, e.g. an unsigned long long uses two dwords. To add two 64-bit numbers, two additions are performed - one on the low 32 bits, and one on the high 32 bits, plus the carry:
; Add 64-bit number from esi onto edi:
mov eax, [esi] ; get low 32 bits of source
add [edi], eax ; add to low 32 bits of destination
; That add may have overflowed, and if it did, carry flag = 1.
mov eax, [esi+4] ; get high 32 bits of source
adc [edi+4], eax ; add to high 32 bits of destination, then add carry.
You can repeat this sequence of add and adcs as much as you like to add arbitrarily big numbers. The same thing can be done with subtraction - just use sub and sbb (subtract with borrow).
Multiplication and division are much more complicated, and the compiler usually produces some small helper functions to deal with these whenever you multiply 64-bit numbers together. Packages like GMP which support very, very large integers use SSE/SSE2 to speed things up. Take a look at this Wikipedia article for more information on multiplication algorithms.
As others have pointed out all 64-bit aritmetic in your example has been optimised away. This answer focuses on the question int the title.
Basically we treat each 32-bit number as a digit and work in base 4294967296. In this manner we can work on arbiterally big numbers.
Addition and subtraction are easiest. We work through the digits one at a time starting from the least significant and moving to the most significant. Generally the first digit is done with a normal add/subtract instruction and later digits are done using a specific "add with carry" or "subtract with borrow" instruction. The carry flag in the status register is used to take the carry/borrow bit from one digit to the next. Thanks to twos complement signed and unsigned addition and subtraction are the same.
Multiplication is a little trickier, multiplying two 32-bit digits can produce a 64-bit result. Most 32-bit processors will have instructions that multiply two 32-bit numbers and produces a 64-bit result in two registers. Addition will then be needed to combine the results into a final answer. Thanks to twos complement signed and unsigned multiplication are the same provided the desired result size is the same as the argument size. If the result is larger than the arguments then special care is needed.
For comparision we start from the most significant digit. If it's equal we move down to the next digit until the results are equal.
Division is too complex for me to describe in this post, but there are plenty of examples out there of algorithms. e.g. http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt
Some real-world examples from gcc https://godbolt.org/g/NclqXC , the assembler is in intel syntax.
First an addition. adding two 64-bit numbers and producing a 64-bit result. The asm is the same for both signed and unsigned versions.
int64_t add64(int64_t a, int64_t b) { return a + b; }
add64:
mov eax, DWORD PTR [esp+12]
mov edx, DWORD PTR [esp+16]
add eax, DWORD PTR [esp+4]
adc edx, DWORD PTR [esp+8]
ret
This is pretty simple, load one argument into eax and edx, then add the other using an add followed by an add with carry. The result is left in eax and edx for return to the caller.
Now a multiplication of two 64-bit numbers to produce a 64-bit result. Again the code doesn't change from signed to unsigned. I've added some comments to make it easier to follow.
Before we look at the code lets consider the math. a and b are 64-bit numbers I will use lo() to represent the lower 32-bits of a 64-bit number and hi() to represent the upper 32 bits of a 64-bit number.
(a * b) = (lo(a) * lo(b)) + (hi(a) * lo(b) * 2^32) + (hi(b) * lo(a) * 2^32) + (hi(b) * hi(a) * 2^64)
(a * b) mod 2^64 = (lo(a) * lo(b)) + (lo(hi(a) * lo(b)) * 2^32) + (lo(hi(b) * lo(a)) * 2^32)
lo((a * b) mod 2^64) = lo(lo(a) * lo(b))
hi((a * b) mod 2^64) = hi(lo(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
uint64_t mul64(uint64_t a, uint64_t b) { return a*b; }
mul64:
push ebx ;save ebx
mov eax, DWORD PTR [esp+8] ;load lo(a) into eax
mov ebx, DWORD PTR [esp+16] ;load lo(b) into ebx
mov ecx, DWORD PTR [esp+12] ;load hi(a) into ecx
mov edx, DWORD PTR [esp+20] ;load hi(b) into edx
imul ecx, ebx ;ecx = lo(hi(a) * lo(b))
imul edx, eax ;edx = lo(hi(b) * lo(a))
add ecx, edx ;ecx = lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
mul ebx ;eax = lo(low(a) * lo(b))
;edx = hi(low(a) * lo(b))
pop ebx ;restore ebx.
add edx, ecx ;edx = hi(low(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
ret
Finally when we try a division we see.
int64_t div64(int64_t a, int64_t b) { return a/b; }
div64:
sub esp, 12
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
call __divdi3
add esp, 28
ret
The compiler has decided that division is too complex to implement inline and instead calls a library routine.
The compiler actually made a static optimization of your code.
lines #1 #2 #3 are parameters for printf()
As #Pafy mentions, the compiler has evaluated this as a constant.
2 to the 64th minus 1 is 0xffffffffffffffff.
As 2 32-bit integers this is: 0xffffffff and 0xffffffff, which if you take that as a pair of 32-bit signed types, ends up as: -1, and -1.
Thus for your compiler the code generated happens to be equivalent to:
printf("max unsigned long long = %llu\n", -1, -1);
In the assembly it's written like this:
movl $-1, 8(%esp) #Second -1 parameter
movl $-1, 12(%esp) #First -1 parameter
movl $.LC0, 4(%esp) #Format string
movl $1, (%esp) #A one. Kind of odd, perhaps __printf_chk
#in your C library expects this.
call __printf_chk
By the way, a better way to calculate powers of 2 is to shift 1 left. Eg. (1ULL << 64) - 1.
No one in this thread noticed that the OP asked to explain the first 4 lines, not lines 11-14.
The first 4 lines are:
.file "size.c"
.section .rodata.str1.4,"aMS",#progbits,1
.align 4
.LC0:
Here's what happens in first 4 lines:
.file "size.c"
This is an assembler directive that says that we are about to start a new logical file called "size.c".
.section .rodata.str1.4,"aMS",#progbits,1
This is also a directive for read only strings in the program.
.align 4
This directive sets the location counter to always be a multiple of 4.
.LC0:
This is a label LC0 that can be jumped to, for example.
I hope I provided the right answer to the question as I answered exactly what OP asked.