Just have one little thing I got stuck on while translating this C code into assembly. This is the line of code I am stuck on.
if (input == '\n')
My assembly code thus far(for this line) is
movl input, %eax #%eax = input
cmpl ___, %eax
How do I compare input to '\n'? Do I just compare it outright or do I move it into the memory first? Thanks.
Try
cmp 0x0D,%eax
0x0D is the ascii code for carriage return, or you can try 0x0A for line feed.
Compare it directly, cmp allows for immediate values.
Related
I'm a little confused about what "cmovb" does in this assembly code
leal (%rsi, %rsi), %eax // %eax <- %rsi + %rsi
cmpl %esi, %edi // compare %edi and %esi
cmovb %edi, %eax
ret
and the C code for this is:
int foo(unsigned int a, unsigned int b)
{
if(a < b)
return a;
else
return 2*b;
}
Can anyone help me understand how cmovb works here?
Like Jester commented to the question, the cmov* family of instructions are conditional moves, paired via the flags register with a previous (comparison) operation.
You can use for example the Intel documentation as a reference for the x86-64/AMD64 instruction set. The conditional move instructions are shown on page 172 of the combined volume.
cmovb, cmovnae, and cmovc all perform the same way: If the carry flag is set, they move the source operand to the destination operand. Otherwise they do nothing.
If we then look at the preceding instructions that affect flags, we'll see that the cmp instruction (the l suffix is part of AT&T syntax, and means the arguments are "longs") changes the set of flags depending on the difference between the two arguments. In particular, if the second is smaller than the first (in AT&T syntax), the carry flag is set, otherwise the carry flag is cleared; just as if a subtraction was performed without storing the result anywhere. (The cmp instruction affects other flags as well, but they are ignored by the code.)
C MOV B = Conditional MOVe if Below (Carry Flag Set). It literally does what it says, if the condition is met then move. The condition is a<b and the value moved is 2*b
The ABI stores the return value in %edi, so it first stores a and then conditionally overwrites it with 2*b.
While debugging an issue with a program crashing on a mangled pointer being dereferenced, I ran lldb and did a disassembly of the crashing function. While perusing the disassembled code, I noticed this odd-looking choice of instructions:
0x100002b06 <+86>: cmpl $0x0, %eax
0x100002b09 <+89>: je 0x100002b14
0x100002b0f <+95>: jmp 0x10000330e
0x100002b14 <+100>: jmp 0x100002c1d
I would expect the code to look like this instead:
0x100002b06 <+86>: cmpl $0x0, %eax
0x100002b09 <+89>: je 0x100002c1d
0x100002b0f <+95>: jmp 0x10000330e
I'm curious as to why Clang made this choice. Is it some sort of branch prediction optimization since this is a NULL pointer check that's very unlikely to match?
edit: This is the originating C code, specifically the line with the NULL pointer check.
traverse = travdone_head;
while (1) {
if (traverse == NULL) nullptr("grokdir() traverse");
/* Don't re-traverse directories we've already seen */
if (inode == traverse->inode && device == traverse->device) {
-O0 is for
Reduce compilation time and make debugging produce the expected
results. This is the default.
It could be interesting to compare with the according source code.
While I shouldn't list out the entire 4 line sample I'm given, (since this is a homework question) I'm confused how this should be read and translated into C.
cmovge %edi, %eax
What I understand so far is that the instruction is a conditional move for when the result is >=. It's comparing the first parameter of a function %edi to the integer register %eax (which was assigned the other parameter value %esi in the previous line of assembly code). However, I don't understand its result.
My problem is interpreting the optimized code. It doesn't manipulate the stack, and I'm not sure how to write this in C (or at least the gcc switch I could even use to generate the same result when compiling).
Could someone please give a few small examples of how the cmovge instruction might translate into C code? If it doesn't make sense as its own line of code, feel free to make something up with it.
This is in x86-64 assembly through a virtualized Linux operating system (CentOS 7).
I'm probably giving you the whole solution here:
int
doit(int a, int b) {
return a >= b ? a : b;
}
With gcc -O3 -masm=intel becomes:
doit:
.LFB0:
.cfi_startproc
cmp edi, esi
mov eax, esi
cmovge eax, edi
ret
.cfi_endproc
I am really new at learning assembly language and just started digging in to it so I was wondering if maybe some of you guys could help me figure one problem out. I have a homework assignment which tells me to compare assembly language instructions to c code and tell me which c code is equivalent to the assembly instructions. So here is the assembly instructions:
pushl %ebp // What i think is happening here is that we are creating more space for the function.
movl %esp,%ebp // Here i think we are moving the stack pointer to the old base pointer.
movl 8(%ebp),%edx // Here we are taking parameter int a and storing it in %edx
movl 12(%ebp),%eax // Here we are taking parameter int b and storing it in %eax
cmpl %eax,%edx // Here i think we are comparing int a and b ( b > a ) ?
jge .L3 // Jump to .L3 if b is greater than a - else continue the instructions
movl %edx,%eax // If the term is not met here it will return b
.L3:
movl %ebp,%esp // Starting to finish the function
popl %ebp // Putting the base pointer in the right place
ret // return
I am trying to comment it out based on my understanding of this - but I might be totally wrong about this. The options for C functions which one of are suppose to be equivalent to are:
int fun1(int a, int b)
{
unsigned ua = (unsigned) a;
if (ua < b)
return b;
else
return ua;
}
int fun2(int a, int b)
{
if (b < a)
return b;
else
return a;
}
int fun3(int a, int b)
{
if (a < b)
return a;
else
return b;
}
I think the correct answer is fun3 .. but I'm not quite sure.
First off, welcome to StackOverflow. Great place, really it is.
Now for starters, let me help you; a lot; a whole lot.
You have good comments that help both you and me and everyone else tremendously, but they are so ugly that reading them is painful.
Here's how to fix that: white space, lots of it, blank lines, and grouping the instructions into small groups that are related to each other.
More to the point, after a conditional jump, insert one blank line, after an absolute jump, insert two blank lines. (Old tricks, work great for readability)
Secondly, line up the comments so that they are neatly arranged. It looks a thousand times better.
Here's your stuff, with 90 seconds of text arranging by me. Believe me, the professionals will respect you a thousand times better with this kind of source code...
pushl %ebp // What i think is happening here is that we are creating more space for the function.
movl %esp,%ebp // Here i think we are moving the stack pointer to the old base pointer.
movl 8(%ebp),%edx // Here we are taking parameter int a and storing it in %edx
movl 12(%ebp),%eax // Here we are taking parameter int b and storing it in %eax
cmpl %eax,%edx // Here i think we are comparing int a and b ( b > a ) ?
// No, Think like this: "What is the value of edx with respect to the value of eax ?"
jge .L3 // edx is greater, so return the value in eax as it is
movl %edx,%eax // If the term is not met here it will return b
// (pssst, I think you're wrong; think it through again)
.L3:
movl %ebp,%esp // Starting to finish the function
popl %ebp // Putting the base pointer in the right place
ret // return
Now, back to your problem at hand. What he's getting at is the "sense" of the compare instruction and the related JGE instruction.
Here's the confuse-o-matic stuff you need to comprehend to survive these sorts of "academic experiences"
This biz, the cmpl %eax,%edx instruction, is one of the forms of the "compare" instructions
Try to form an idea something like this when you see that syntax, "...What is the value of the destination operand with respect to the source operand ?..."
Caveat: I am absolutely no good with the AT&T syntax, so anybody is welcome to correct me on this.
Anyway, in this specific case, you can phrase the idea in your mind like this...
"...I see cmpl %eax,%edx so I think: With respect to eax, the value in edx is..."
You then complete that sentence in your mind with the "sense" of the next instruction which is a conditional jump.
The paradigmatic process in the human brain works out to form a sentence like this...
"...With respect to eax, the value in edx is greater or equal, so I jump..."
So, if you are correct about the locations of a and b, then you can do the paradigmatic brain scrambler and get something like this...
"...With respect to the value in b, that value in a is greater or equal, so I will jump..."
To get a grasp of this, take note that JGE is the "opposite sense" if you will, of JL (i.e., "Jump if less than")
Okay, now it so happens that return in C is related to the ret instruction in assembly language, but it isn't the same thing.
When C programmers say "...That function returns an int..." what they mean is...
The assembly language subroutine will place a value in Eax
The subroutine will then fix the stack and put it back in neat order
The subroutine will then execute its Ret instruction
One more item of obfuscation is thrown in your face now.
These following conditional jumps are applicable to Signed arithmetic comparison operations...
JG
JGE
JNG
JL
JLE
JNL
There it is ! The trap waiting to screw you up in all this !
Do you want to do signed or unsigned compares ???
By the way, I've never seen anybody do anything like that first function where an unsigned number is compared with a signed number. Is that even legal ?
So anyway, we put all these facts together, and we get: This assembly language routine returns the value in a if it is less than the value in b otherwise it returns the value in b.
These values are evaluated as signed integers.
(I think I got that right; somebody check my logic. I really don't like that assembler's syntax at all.)
So anyway, I am reasonably certain that you don't want to ask people on the internet to provide you with the specific answer to your specific homework question, so I'll leave it up to you to figure it out from this explanation.
Hopefully, I have explained enough of the logic and the "sense" of comparisons and the signed and unsigned biz so that you can get your brain around this.
Oh, and disclaimer again, I always use the Intel syntax (e.g., Masm, Tasm, Nasm, whatever) so if I got something backwards here, feel free to correct it for me.
While writing some C code, I decided to compile it to assembly and read it--I just sort of, do this from time to time--sort of an exercise to keep me thinking about what the machine is doing every time I write a statement in C.
Anyways, I wrote these two lines in C
asm(";move old_string[i] to new_string[x]");
new_string[x] = old_string[i];
asm(";shift old_string[i+1] into new_string[x]");
new_string[x] |= old_string[i + 1] << 8;
(old_string is an array of char, and new_string is an array of unsigned short, so given two chars, 42 and 43, this will put 4342 into new_string[x])
Which produced the following output:
#move old_string[i] to new_string[x]
movl -20(%ebp), %esi #put address of first char of old_string in esi
movsbw (%edi,%esi),%dx #put first char into dx
movw %dx, (%ecx,%ebx,2) #put first char into new_string
#shift old_string[i+1] into new_string[x]
movsbl 1(%esi,%edi),%eax #put old_string[i+1] into eax
sall $8, %eax #shift it left by 8 bits
orl %edx, %eax #or edx into it
movw %ax, (%ecx,%ebx,2) #?
(I'm commenting it myself, so I can follow what's going on).
I compiled it with -O3, so I could also sort of see how the compiler optimizes certain constructs. Anyways, I'm sure this is probably simple, but here's what I don't get:
the first section copies a char out of old_string[i], and then movw's it (from dx) to (%ecx,%ebx). Then the next section, copies old_string[i+1], shifts it, ors it, and then puts it into the same place from ax. It puts two 16 bit values into the same place? Wouldn't this not work?
Also, it shifts old_string[i+1] to the high-order dword of eax, then ors edx (new_string[x]) into it... then puts ax into the memory! Wouldn't ax just contain what was already in new_string[x]? so it saves the same thing to the same place in memory twice?
Is there something I'm missing? Also, I'm fairly certain that the rest of the compiled program isn't relevant to this snippet... I've read around before and after, to find where each array and different variables are stored, and what the registers' values would be upon reaching that code--I think that this is the only piece of the assembly that matters for these lines of C.
--
oh, turns out GNU assembly comments are started with a #.
Okay, so it was pretty simple after all.
I figured it out with a pen and paper, writing down each step, what it did to each register, and then wrote down the contents of each register given an initial starting value...
What got me was that it was using 32 bit and 16 bit registers for 16 and 8 bit data types...
This is what I thought was happening:
first value put into memory as, say, 0001 (I was thinking 01).
second value (02) loaded into 32 bit register (so it was like, 00000002, I was thinking, 0002)
second value shifted left 8 bits (00000200, I was thinking, 0200)
first value (0000001, I thought 0001) xor'd into second value (00000201, I thought 0201)
16 bit register put into memory (0201, I was thinking, just 01 again).
I didn't get why it wrote it to memory twice though, or why it was using 32 bit registers (well, actually, my guess is that a 32 bit processor is way faster at working with 32 bit values than it is with 8 and 16 bit values, but that's a totally uneducated guess), so I tried rewriting it:
movl -20(%ebp), %esi #gets pointer to old_string
movsbw (%edi,%esi),%dx #old_string[i] -> dx (0001)
movsbw 1(%edi,%esi),%ax #old_string[i + 1] -> ax (0002)
salw $8, %ax #shift ax left (0200)
orw %dx, %ax #or dx into ax (0201)
movw %ax,(%ecx,%ebx,2) #doesn't write to memory until end
This worked exactly the same.
I don't know if this is an optimization or not (aside from taking one memory write out, which obviously is), but if it is, I know it's not really worth it and didn't gain me anything. In any case, I get what this code is doing now, thanks for the help all.
I'm not sure what's not to understand, unless I'm missing something.
The first 3 instructions load a byte from old_string into dx and stores that to your new_string.
The next 3 instructions utilize what's already in dx and combines old_string[i+1] with it, and stores it as a 16-bit value (ax) to new_string.
Also, it shifts old_string[i+1] to the high-order dword of eax, then
ors edx (new_string[x]) into it... then puts ax into the memory! Wouldn't
ax just contain what was already in new_string[x]? so it saves the same
thing to the same place in memory twice?
Now you see why optimizers are a Good Thing. That kind of redundant code shows up pretty often in unoptimized, generated code, because the generated code comes more or less from templates that don't "know" what happened before or after.