Related
I'm in the process of learning Assembly language using NASM, and have run into a programming problem that I can't seem to figure out. The goal of the program is to solve this equation:
Picture of Equation
For those unable to see the photo, the equation says that for two arrays of length n, array a and array b, find: for i=0 to n-1, ((ai + 3) - (bi - 4))
I'm only supposed to use three general registers, and I've figured out a code sample I think could possibly work, but I keep running into comma and operand errors with lines 16 and 19. I understand that in order to iterate through the array you need to move a pointer to each index, but since both arrays are of different values (array 1 is dw and array 2 is db) I am unsure how to account for that. I'm still very new to Assembly, and any help or pointers would be appreciated.
Here is a picture of my current code:
Code Sample
segment .data
a dw 12, 14, 16 ; array of three values
b db 2, 4, 5 ; array of three values
n dw 3 ; length of both arrays
result dq 0 ; memory to result
segment .text
global main
main:
mov rax, 0
mov rbx, 0
mov rdx, 0
loop_start:
cmp rax, [n]
jge loop_end
add rbx, a[rax*4] ; adding element of a at current index to rbx
add rbx, 3 ; adding 3 to current index value of array a in rbx
add rdx, BYTE b[rax]
sub rdx, 4
sub rbx, [rdx]
add [result], rbx
xor rbx, rbx
xor rdx, rdx
add rax, 1
loop_end:
ret
You are using 16-bit and 8-bit data, but 64-bit registers. Generally speaking, the processor requires the same data size though out the operands of any single instruction.
cmp rax,[n] has varying data size, which is not allowed: rax is a 64-bit register, and [n] is a 16 bit data item. So, we can change this to cmp ax,[n], and now everything is 16-bit.
add rbx,a[rax*4] is also mixing different size operands (not allowed). rbx is 64-bits and a[] is 16-bits. You can change the register to bx and this will be allowed. But also let's note that *4 is too much it should be *2 since dw is 16-bit data (2-byte), not 32-bit (4-byte). Since you're clearing rbx, you don't need an add here you can simply mov.
add rdx, BYTE b[rax] is also mixing different sizes. rax is 64-bits wide whereas b[] is 8-bits wide. Use dl instead of rdx. There is nothing to add to with this so you should use a mov instead of add. Now that there's a value in dl, and you previously cleared rdx, you can switch to using dx (from dl) this will have the 16-bit value of b[i].
sub rbx, [rdx] has an erroneous deference. Here you just want to sub bx,dx.
You are not using the label loop_start, so there is no loop. (Add a backward branch at the end of the loop.)
...but since both arrays are of different values (array 1 is dw and array 2 is db) I am unsure how to account for that
Erik Eidt's answer explaines why you "keep running into comma and operand errors". Although you can revert to using the smaller registers (adding operand size prefixes), my answer takes a different approach.
The instruction set has the movzx (move with zero extension) and movsx (move with sign extension) instructions to deal with these varying sizes. See below how to use these.
I've applied a few changes too.
Don't miss an opportunity to simplify your calculation:
((a[i] + 3) - (b[i] - 4)) is equivalent to (a[i] - b[i] + 7)
None of these arrays is empty, so you can just put the loop condition below its body.
You can process the arrays starting at the end if it's convenient. The summation operation doesn't mind!
segment .data
a dw 12, 14, 16 ; array of three values
b db 2, 4, 5 ; array of three values
n dw 3 ; length of both arrays
result dq 0 ; memory to result
segment .text
global main
main:
movzx rcx, word [n]
loop_start:
movzx rax, word [a + rcx * 2 - 2]
movzx rbx, byte [b + rcx - 1]
lea rax, [rax + rbx + 7]
add [result], rax
dec rcx
jnz loop_start
ret
Please notice that the additional negative offsets - 2 and - 1 exist to compensate for the fact that the loop control takes on the values {3, 2, 1} when {2, 1, 0} would have been perfect. This does not introduce an extra displacement component to the instruction since the mention of the a and b arrays is in fact already the displacement.
Although this is tagged x86-64, you can write the whole thing using 32-bit registers and not require the REX prefixes. Same result.
segment .data
a dw 12, 14, 16 ; array of three values
b db 2, 4, 5 ; array of three values
n dw 3 ; length of both arrays
result dq 0 ; memory to result
segment .text
global main
main:
movzx ecx, word [n]
loop_start:
movzx eax, word [a + ecx * 2 - 2]
movzx ebx, byte [b + ecx - 1]
lea eax, [eax + ebx + 7]
add [result], eax
dec ecx
jnz loop_start
ret
I would like to know what the difference between these instructions is:
MOV AX, [TABLE-ADDR]
and
LEA AX, [TABLE-ADDR]
LEA means Load Effective Address
MOV means Load Value
In short, LEA loads a pointer to the item you're addressing whereas MOV loads the actual value at that address.
The purpose of LEA is to allow one to perform a non-trivial address calculation and store the result [for later usage]
LEA ax, [BP+SI+5] ; Compute address of value
MOV ax, [BP+SI+5] ; Load value at that address
Where there are just constants involved, MOV (through the assembler's constant calculations) can sometimes appear to overlap with the simplest cases of usage of LEA. Its useful if you have a multi-part calculation with multiple base addresses etc.
In NASM syntax:
mov eax, var == lea eax, [var] ; i.e. mov r32, imm32
lea eax, [var+16] == mov eax, var+16
lea eax, [eax*4] == shl eax, 2 ; but without setting flags
In MASM syntax, use OFFSET var to get a mov-immediate instead of a load.
The instruction MOV reg,addr means read a variable stored at address addr into register reg. The instruction LEA reg,addr means read the address (not the variable stored at the address) into register reg.
Another form of the MOV instruction is MOV reg,immdata which means read the immediate data (i.e. constant) immdata into register reg. Note that if the addr in LEA reg,addr is just a constant (i.e. a fixed offset) then that LEA instruction is essentially exactly the same as an equivalent MOV reg,immdata instruction that loads the same constant as immediate data.
None of the previous answers quite got to the bottom of my own confusion, so I'd like to add my own.
What I was missing is that lea operations treat the use of parentheses different than how mov does.
Think of C. Let's say I have an array of long that I call array. Now the expression array[i] performs a dereference, loading the value from memory at the address array + i * sizeof(long) [1].
On the other hand, consider the expression &array[i]. This still contains the sub-expression array[i], but no dereferencing is performed! The meaning of array[i] has changed. It no longer means to perform a deference but instead acts as a kind of a specification, telling & what memory address we're looking for. If you like, you could alternatively think of the & as "cancelling out" the dereference.
Because the two use-cases are similar in many ways, they share the syntax array[i], but the existence or absence of a & changes how that syntax is interpreted. Without &, it's a dereference and actually reads from the array. With &, it's not. The value array + i * sizeof(long) is still calculated, but it is not dereferenced.
The situation is very similar with mov and lea. With mov, a dereference occurs that does not happen with lea. This is despite the use of parentheses that occurs in both. For instance, movq (%r8), %r9 and leaq (%r8), %r9. With mov, these parentheses mean "dereference"; with lea, they don't. This is similar to how array[i] only means "dereference" when there is no &.
An example is in order.
Consider the code
movq (%rdi, %rsi, 8), %rbp
This loads the value at the memory location %rdi + %rsi * 8 into the register %rbp. That is: get the value in the register %rdi and the value in the register %rsi. Multiply the latter by 8, and then add it to the former. Find the value at this location and place it into the register %rbp.
This code corresponds to the C line x = array[i];, where array becomes %rdi and i becomes %rsi and x becomes %rbp. The 8 is the length of the data type contained in the array.
Now consider similar code that uses lea:
leaq (%rdi, %rsi, 8), %rbp
Just as the use of movq corresponded to dereferencing, the use of leaq here corresponds to not dereferencing. This line of assembly corresponds to the C line x = &array[i];. Recall that & changes the meaning of array[i] from dereferencing to simply specifying a location. Likewise, the use of leaq changes the meaning of (%rdi, %rsi, 8) from dereferencing to specifying a location.
The semantics of this line of code are as follows: get the value in the register %rdi and the value in the register %rsi. Multiply the latter by 8, and then add it to the former. Place this value into the register %rbp. No load from memory is involved, just arithmetic operations [2].
Note that the only difference between my descriptions of leaq and movq is that movq does a dereference, and leaq doesn't. In fact, to write the leaq description, I basically copy+pasted the description of movq, and then removed "Find the value at this location".
To summarize: movq vs. leaq is tricky because they treat the use of parentheses, as in (%rsi) and (%rdi, %rsi, 8), differently. In movq (and all other instruction except lea), these parentheses denote a genuine dereference, whereas in leaq they do not and are purely convenient syntax.
[1] I've said that when array is an array of long, the expression array[i] loads the value from the address array + i * sizeof(long). This is true, but there's a subtlety that should be addressed. If I write the C code
long x = array[5];
this is not the same as typing
long x = *(array + 5 * sizeof(long));
It seems that it should be based on my previous statements, but it's not.
What's going on is that C pointer addition has a trick to it. Say I have a pointer p pointing to values of type T. The expression p + i does not mean "the position at p plus i bytes". Instead, the expression p + i actually means "the position at p plus i * sizeof(T) bytes".
The convenience of this is that to get "the next value" we just have to write p + 1 instead of p + 1 * sizeof(T).
This means that the C code long x = array[5]; is actually equivalent to
long x = *(array + 5)
because C will automatically multiply the 5 by sizeof(long).
So in the context of this StackOverflow question, how is this all relevant? It means that when I say "the address array + i * sizeof(long)", I do not mean for "array + i * sizeof(long)" to be interpreted as a C expression. I am doing the multiplication by sizeof(long) myself in order to make my answer more explicit, but understand that due to that, this expression should not be read as C. Just as normal math that uses C syntax.
[2] Side note: because all lea does is arithmetic operations, its arguments don't actually have to refer to valid addresses. For this reason, it's often used to perform pure arithmetic on values that may not be intended to be dereferenced. For instance, cc with -O2 optimization translates
long f(long x) {
return x * 5;
}
into the following (irrelevant lines removed):
f:
leaq (%rdi, %rdi, 4), %rax # set %rax to %rdi + %rdi * 4
ret
If you only specify a literal, there is no difference. LEA has more abilities, though, and you can read about them here:
http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_6/CH06-1.html#HEADING1-136
It depends on the used assembler, because
mov ax,table_addr
in MASM works as
mov ax,word ptr[table_addr]
So it loads the first bytes from table_addr and NOT the offset to table_addr. You should use instead
mov ax,offset table_addr
or
lea ax,table_addr
which works the same.
lea version also works fine if table_addr is a local variable e.g.
some_procedure proc
local table_addr[64]:word
lea ax,table_addr
As stated in the other answers:
MOV will grab the data at the address inside the brackets and place that data into the destination operand.
LEA will perform the calculation of the address inside the brackets and place that calculated address into the destination operand. This happens without actually going out to the memory and getting the data. The work done by LEA is in the calculating of the "effective address".
Because memory can be addressed in several different ways (see examples below), LEA is sometimes used to add or multiply registers together without using an explicit ADD or MUL instruction (or equivalent).
Since everyone is showing examples in Intel syntax, here are some in AT&T syntax:
MOVL 16(%ebp), %eax /* put long at ebp+16 into eax */
LEAL 16(%ebp), %eax /* add 16 to ebp and store in eax */
MOVQ (%rdx,%rcx,8), %rax /* put qword at rcx*8 + rdx into rax */
LEAQ (%rdx,%rcx,8), %rax /* put value of "rcx*8 + rdx" into rax */
MOVW 5(%bp,%si), %ax /* put word at si + bp + 5 into ax */
LEAW 5(%bp,%si), %ax /* put value of "si + bp + 5" into ax */
MOVQ 16(%rip), %rax /* put qword at rip + 16 into rax */
LEAQ 16(%rip), %rax /* add 16 to instruction pointer and store in rax */
MOVL label(,1), %eax /* put long at label into eax */
LEAL label(,1), %eax /* put the address of the label into eax */
Basically ... "Move into REG ... after computing it..."
it seems to be nice for other purposes as well :)
if you just forget that the value is a pointer
you can use it for code optimizations/minimization ...what ever..
MOV EBX , 1
MOV ECX , 2
;//with 1 instruction you got result of 2 registers in 3rd one ...
LEA EAX , [EBX+ECX+5]
EAX = 8
originaly it would be:
MOV EAX, EBX
ADD EAX, ECX
ADD EAX, 5
Lets understand this with a example.
mov eax, [ebx]
and
lea eax, [ebx]
Suppose value in ebx is 0x400000. Then mov will go to address 0x400000 and copy 4 byte of data present their to eax register.Whereas lea will copy the address 0x400000 into eax. So, after the execution of each instruction value of eax in each case will be (assuming at memory 0x400000 contain is 30).
eax = 30 (in case of mov)
eax = 0x400000 (in case of lea)
For definition mov copy the data from rm32 to destination (mov dest rm32) and lea(load effective address) will copy the address to destination (mov dest rm32).
MOV can do same thing as LEA [label], but MOV instruction contain the effective address inside the instruction itself as an immediate constant (calculated in advance by the assembler). LEA uses PC-relative to calculate the effective address during the execution of the instruction.
LEA (Load Effective Address) is a shift-and-add instruction. It was added to 8086 because hardware is there to decode and calculate adressing modes.
The difference is subtle but important. The MOV instruction is a 'MOVe' effectively a copy of the address that the TABLE-ADDR label stands for. The LEA instruction is a 'Load Effective Address' which is an indirected instruction, which means that TABLE-ADDR points to a memory location at which the address to load is found.
Effectively using LEA is equivalent to using pointers in languages such as C, as such it is a powerful instruction.
I have a problem with asm code that works when mixed with C, but does not when used in asm code with proper parameters.
;; array - RDI, x- RSI, y- RDX
getValue:
mov r13, rsi
sal r13, $3
mov r14, rdx
sal r14, $2
mov r15, [rdi+r13]
mov rax, [r15+r14]
ret
Technically I want to keep the rdi, rsi and rdx registers untouched and thus I use other ones.
I am using an x64 machine and thus my pointers have 8 bytes. Technically speaking this code is supposed to do:
int getValue(int** array, int x, int y) {
return array[x][y];
}
it somehow works inside my C code, but does not when used in asm in this way:
mov rdi, [rdi] ;; get first pointer - first row
mov r9, $4 ;; we want second element from the row
mov rax, [rdi+r9] ;; get the element (4 bytes vs 8 bytes???)
mov rdi, FMT ;; prepare printf format "%d", 10, 0
mov rsi, rax ;; we want to print the element we just fetched
mov eax, $0 ;; say we have no non-integer argument
call printf ;; always gives 0, no matter what's in the matrix
Can someone see into this and help me? Thanks in advance.
The sal r14, $2 implies the elements are dwords, so the last line before the ret shouldn't load a qword. Besides, x86 has nice scaling addressing modes, so you can do this:
mov rax, [rdi + rsi * 8] ; load pointer to column
mov eax, [rax + rdx * 4] ; note this loads a dword
ret
That implies that you have an array of pointers to columns, which is unusual. You can do that, but was it intended?
This is a standard matrix of integers.
int** array;
sizeof(int*) == 8
sizeof(int) == 4
How I see it is that when I have that array at first, I have a pointer to a space of memory without "blanks" that holds all pointers one by one (index-by-index), so I say "let's go to the element rsi-th of the array" and that's why I shift by rsi-th * 8 bytes. So now I get the same situation, but the pointer should point to a space of integers, so 4-byte items. That's why I shift by 4 bytes there.
Is my thinking wrong?
LEA EAX, [EAX]
I encountered this instruction in a binary compiled with the Microsoft C compiler. It clearly can't change the value of EAX. Then why is it there?
It is a NOP.
The following are typcially used as NOP. They all do the same thing but they result in machine code of different length. Depending on the alignment requirement one of them is chosen:
xchg eax, eax = 90
mov eax, eax = 89 C0
lea eax, [eax + 0x00] = 8D 40 00
From this article:
This trick is used by MSVC++ compiler
to emit the NOP instructions of
different length (for padding before
jump targets). For example, MSVC++
generates the following code if it
needs 4-byte and 6-byte padding:
8d6424 00 lea [ebx+00],ebx
; 4-byte padding 8d9b 00000000
lea [esp+00000000],esp ; 6-byte
padding
The first line is marked as "npad 4"
in assembly listings generated by the
compiler, and the second is "npad 6".
The registers (ebx, esp) can be chosen
from the rarely used ones to avoid
false dependencies in the code.
So this is just a kind of NOP, appearing right before targets of jmp instructions in order to align them.
Interestingly, you can identify the compiler from the characteristic nature of such instructions.
LEA EAX, [EAX]
Indeed doesn't change the value of EAX. As far as I understand, it's identical in function to:
MOV EAX, EAX
Did you see it in optimized code, or unoptimized code?
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is such complex code emitted for dividing a signed integer by a power of two?
Background
I'm just learning x86 asm by examining the binary code generated by the compiler.
Code compiled using the C++ compiler in Visual Studio 2010 beta 2.
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.21003.01 for 80x86
C code (sandbox.c)
int mainCRTStartup()
{
int x=5;int y=1024;
while(x) { x--; y/=2; }
return x+y;
}
Compile it using the Visual Studio Command Prompt
cl /c /O2 /Oy- /MD sandbox.c
link /NODEFAULTLIB /MANIFEST:NO /SUBSYSTEM:CONSOLE sandbox.obj
Disasm sandbox.exe in OllyDgb
The following starts from the entry point.
00401000 >/$ B9 05000000 MOV ECX,5
00401005 |. B8 00040000 MOV EAX,400
0040100A |. 8D9B 00000000 LEA EBX,DWORD PTR DS:[EBX]
00401010 |> 99 /CDQ
00401011 |. 2BC2 |SUB EAX,EDX
00401013 |. D1F8 |SAR EAX,1
00401015 |. 49 |DEC ECX
00401016 |.^75 F8 \JNZ SHORT sandbox.00401010
00401018 \. C3 RETN
Examination
MOV ECX, 5 int x=5;
MOV EAX, 400 int y=1024;
LEA ... // no idea what LEA does here. seems like ebx=ebx. elaborate please.
// in fact, NOPing it does nothing to the original procedure and the values.
CQD // sign extends EAX into EDX:EAX, which here: edx = 0. no idea why.
SUB EAX, EDX // eax=eax-edx, here: eax=eax-0. no idea, pretty redundant.
SAR EAX,1 // okay, y/= 2
DEC ECX // okay, x--, sets the zero flag when reaches 0.
JNZ ... // okay, jump back to CQD if the zero flag is not set.
This part bothers me:
0040100A |. 8D9B 00000000 LEA EBX,DWORD PTR DS:[EBX]
00401010 |> 99 /CDQ
00401011 |. 2BC2 |SUB EAX,EDX
You can nop it all and the values of EAX and ECX will remain the same at the end. So, what's the point of these instructions?
The whole thing
00401010 |> 99 /CDQ
00401011 |. 2BC2 |SUB EAX,EDX
00401013 |. D1F8 |SAR EAX,1
stands for the y /= 2. You see, a standalone SAR would not perform the signed integer division the way the compiler authors intended. C++98 standard recommends that signed integer division rounds the result towards 0, while SAR alone would round towards the negative infinity. (It is permissible to round towards negative infinity, the choice is left to the implementation). In order to implement rounding to 0 for negative operands, the above trick is used. If you use an unsigned type instead of a signed one, then the compiler will generate just a single shift instruction, since the issue with negative division will not take place.
The trick is pretty simple: for negative y sign extension will place a pattern of 11111...1 in EDX, which is actually -1 in 2's complement representation. The following SUB will effectively add 1 to EAX if the original y value was negative. If the original y was positive (or 0), the EDX will hold 0 after the sign extension and EAX will remain unchanged.
In other words, when you write y /= 2 with signed y, the compiler generates the code that does something more like the following
y = (y < 0 ? y + 1 : y) >> 1;
or, better
y = (y + (y < 0)) >> 1;
Note, that C++ standard does not require the result of the division to be rounded towards zero, so the compiler has the right to do just a single shift even for signed types. However, normally compilers follow the recommendation to round towards zero (or offer an option to control the behavior).
P.S. I don't know for sure what the purpose of that LEA instruction is. It is indeed a no-op. However, I suspect that this might be just a placeholder instruction inserted into the code for further patching. If I remember correctly, MS compiler has an option that forces the insertion of placeholder instructions at the beginning and at the end of each function. In the future this instruction can be overwritten by the patcher with a CALL or JMP instruction that will execute the patch code. This specific LEA was chosen just because it produces the a no-op placeholder instruction of the correct length. Of course, it could be something completely different.
The lea ebx,[ebx] is just a NOP operation. Its purpose is to align the beginning of the loop in memory, which will make it faster. As you can see here, the beginning of the loop starts at address 0x00401010, which is divisible by 16, thanks to this instruction.
The CDQ and SUB EAX,EDX operations make sure that the division will round a negative number towards zero - otherwise SAR would round it down, giving incorrect results for negative numbers.
The reason that the compiler emits this:
LEA EBX,DWORD PTR DS:[EBX]
instead of the semantically equivalent:
NOP
NOP
NOP
NOP
NOP
NOP
..is that it's faster for the processor to execute one 6-byte instruction than six 1-byte instructions. That's all.
This doesn't really answer the question, but is a helpful hint. Instead of mucking around with the OllyDbg.exe thing, you can make Visual Studio generate the asm file for you, which has the added bonus that it can put in the original source code as comments. This isn't a big deal for your current small project, but as your project grows, you may end up spending a fair amount of time figuring out which assembly code matches which source code.
From the command line, you want the /FAs and /Fa options (MSDN).
Here's part of the output for your example code (I compiled debug code, so the .asm is longer, but you can do the same thing for your optimized code):
_wmain PROC ; COMDAT
; 8 : {
push ebp
mov ebp, esp
sub esp, 216 ; 000000d8H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-216]
mov ecx, 54 ; 00000036H
mov eax, -858993460 ; ccccccccH
rep stosd
; 9 : int x=5; int y=1024;
mov DWORD PTR _x$[ebp], 5
mov DWORD PTR _y$[ebp], 1024 ; 00000400H
$LN2#wmain:
; 10 : while(x) { x--; y/=2; }
cmp DWORD PTR _x$[ebp], 0
je SHORT $LN1#wmain
mov eax, DWORD PTR _x$[ebp]
sub eax, 1
mov DWORD PTR _x$[ebp], eax
mov eax, DWORD PTR _y$[ebp]
cdq
sub eax, edx
sar eax, 1
mov DWORD PTR _y$[ebp], eax
jmp SHORT $LN2#wmain
$LN1#wmain:
; 11 : return x+y;
mov eax, DWORD PTR _x$[ebp]
add eax, DWORD PTR _y$[ebp]
; 12 : }
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
_wmain ENDP
Hope that helps!