Have a question about PTR when used in a CALL instruction when using MASM - masm

I have not seen in MASM's documentation information about PTR register as used in this example:
.486
.Model flat,stdcall
label1 typedef ptr proc
.data
mydata db 20h
.code
main proc
call label1 ptr esi
main endp
end
Where do I find information on the CALL instruction with using PTR register?

PTR is not a register. PTR is a MASM syntax element, shorthand for "pointer" and is a kind of cast operator. So for example mov DWORD PTR [esi], 0 means "treat the address esi points to as DWORD (32 bit), and copy a zero there", so this clears 4 bytes. In contrast, mov BYTE PTR [esi], 0 means to just clear 1 byte.
In your example, label1 is defined to represent the type "pointer to procedure". In the CALL instruction, PTR is used to tell the assembler that the address esi points to has the type "procedure". The CALL intstruction calls the procedure, whose address is stored in ESI.
As far as I know, this cast is useless in this example, because there is no ambiguity to be resolved by the cast: CALL with a 32 bit register operand is always an indirect near call, taking the address to be called from the 32-bit register.

Related

Does function parameter names has a place in memory in C?

I dont think function parameter names are treated like variables and they doesnt get stored in memory. But I dont get how we can use these parameters in functions as variables if they dont have any place in memory. Can anyone explain me whats going on with function parameters and if they have place or not in memory
All variables are either allocated somewhere or optimized away in case the compiler found them unnecessary. Function parameters are variables and are almost certainly stored either in a CPU register or on the stack, if they are used by the program.
The only time when they might not get allocated is when the function is inlined - when the whole function call is optimized away and the function code is instead injected in the caller-side machine code. In such cases the original variables used by the caller might be used instead.
Function parameter names however are not stored anywhere in the final executable, just like any other identifier isn't stored there either. Names of variables, functions etc only exist in the source code, for the benefit of the programmer alone.
Although your title asks about “function parameter names,” it appears your question is about function parameters, which are different.
Commonly, arguments are passed to functions by putting them in processor registers or on the hardware stack. Each computing platform has some specification of which arguments should be passed where. For example, the first few small arguments (such as int values) may be passed in certain processor registers, while more or larger arguments may be put on the stack.
To the called function, these are parameters. The called function uses them from the processor registers or the stack.
I'm writing this answer assuming you know what the stack and CPU registers are. If you don't, I'd suggest you look them up before seeing this answer.
I dont think function parameter names are treated like variables and they doesnt get stored in memory.
At the assembly level, function parameter names don't really exist. But for function parameters, it depends on the assembly generated based on the compiler's level of optimization. Consider this simple function:
int foo(int a, int b)
{
return a + b;
}
Using Compiler Explorer, I checked the generated disassembly of x64 GCC 10.2. On -O0, it looks like this:
foo:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
pop rbp
ret
These two lines:
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
interestingly show that the arguments are passed to edi and esi for a and b respectively, and then moved into the stack, presumably in case the registers need to be used elsewhere in the function. The rest of the function uses the space in the stack as opposed to the registers:
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
(In case you didn't know, eax/rax generally holds the value for functions and edx in this case just serves as a general-purpose register, so these three lines are besaically eax = a; edx = b; eax += edx).
Ok, so that makes sense. The arguments are passed to registers and copied to the stack, where they are used for the rest of the function. What about -O1?
foo:
lea eax, [rdi+rsi]
ret
Now that is a lot shorter. Here, eax gets the value of rdi + rsi and the function ends. All the copying to the stack is completely skipped and the registers are used directly. So yes, in this case, the memory is never used.
EDIT
After writing this answer, I went and checked the generated assembly with the -m32 option and noticed that arguments were always pushed to the stack before the function was called. Assembly generated from -O0 looks like this:
foo:
push ebp
mov ebp, esp
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+12]
add eax, edx
pop ebp
ret
Here, since the arguments are passed to the stack before the function is called, they don't have be copied from the registers to the stack (because they're already there). So the function is shorter, and amount of registers used is reduced. However, on higher levels of optimization, the function ends up becoming longer because of this:
foo:
mov eax, DWORD PTR [esp+8]
add eax, DWORD PTR [esp+4]
ret
So with -m32 set, parameters are always placed in memory.

load array with assembly lea [duplicate]

I would like to know what the difference between these instructions is:
MOV AX, [TABLE-ADDR]
and
LEA AX, [TABLE-ADDR]
LEA means Load Effective Address
MOV means Load Value
In short, LEA loads a pointer to the item you're addressing whereas MOV loads the actual value at that address.
The purpose of LEA is to allow one to perform a non-trivial address calculation and store the result [for later usage]
LEA ax, [BP+SI+5] ; Compute address of value
MOV ax, [BP+SI+5] ; Load value at that address
Where there are just constants involved, MOV (through the assembler's constant calculations) can sometimes appear to overlap with the simplest cases of usage of LEA. Its useful if you have a multi-part calculation with multiple base addresses etc.
In NASM syntax:
mov eax, var == lea eax, [var] ; i.e. mov r32, imm32
lea eax, [var+16] == mov eax, var+16
lea eax, [eax*4] == shl eax, 2 ; but without setting flags
In MASM syntax, use OFFSET var to get a mov-immediate instead of a load.
The instruction MOV reg,addr means read a variable stored at address addr into register reg. The instruction LEA reg,addr means read the address (not the variable stored at the address) into register reg.
Another form of the MOV instruction is MOV reg,immdata which means read the immediate data (i.e. constant) immdata into register reg. Note that if the addr in LEA reg,addr is just a constant (i.e. a fixed offset) then that LEA instruction is essentially exactly the same as an equivalent MOV reg,immdata instruction that loads the same constant as immediate data.
None of the previous answers quite got to the bottom of my own confusion, so I'd like to add my own.
What I was missing is that lea operations treat the use of parentheses different than how mov does.
Think of C. Let's say I have an array of long that I call array. Now the expression array[i] performs a dereference, loading the value from memory at the address array + i * sizeof(long) [1].
On the other hand, consider the expression &array[i]. This still contains the sub-expression array[i], but no dereferencing is performed! The meaning of array[i] has changed. It no longer means to perform a deference but instead acts as a kind of a specification, telling & what memory address we're looking for. If you like, you could alternatively think of the & as "cancelling out" the dereference.
Because the two use-cases are similar in many ways, they share the syntax array[i], but the existence or absence of a & changes how that syntax is interpreted. Without &, it's a dereference and actually reads from the array. With &, it's not. The value array + i * sizeof(long) is still calculated, but it is not dereferenced.
The situation is very similar with mov and lea. With mov, a dereference occurs that does not happen with lea. This is despite the use of parentheses that occurs in both. For instance, movq (%r8), %r9 and leaq (%r8), %r9. With mov, these parentheses mean "dereference"; with lea, they don't. This is similar to how array[i] only means "dereference" when there is no &.
An example is in order.
Consider the code
movq (%rdi, %rsi, 8), %rbp
This loads the value at the memory location %rdi + %rsi * 8 into the register %rbp. That is: get the value in the register %rdi and the value in the register %rsi. Multiply the latter by 8, and then add it to the former. Find the value at this location and place it into the register %rbp.
This code corresponds to the C line x = array[i];, where array becomes %rdi and i becomes %rsi and x becomes %rbp. The 8 is the length of the data type contained in the array.
Now consider similar code that uses lea:
leaq (%rdi, %rsi, 8), %rbp
Just as the use of movq corresponded to dereferencing, the use of leaq here corresponds to not dereferencing. This line of assembly corresponds to the C line x = &array[i];. Recall that & changes the meaning of array[i] from dereferencing to simply specifying a location. Likewise, the use of leaq changes the meaning of (%rdi, %rsi, 8) from dereferencing to specifying a location.
The semantics of this line of code are as follows: get the value in the register %rdi and the value in the register %rsi. Multiply the latter by 8, and then add it to the former. Place this value into the register %rbp. No load from memory is involved, just arithmetic operations [2].
Note that the only difference between my descriptions of leaq and movq is that movq does a dereference, and leaq doesn't. In fact, to write the leaq description, I basically copy+pasted the description of movq, and then removed "Find the value at this location".
To summarize: movq vs. leaq is tricky because they treat the use of parentheses, as in (%rsi) and (%rdi, %rsi, 8), differently. In movq (and all other instruction except lea), these parentheses denote a genuine dereference, whereas in leaq they do not and are purely convenient syntax.
[1] I've said that when array is an array of long, the expression array[i] loads the value from the address array + i * sizeof(long). This is true, but there's a subtlety that should be addressed. If I write the C code
long x = array[5];
this is not the same as typing
long x = *(array + 5 * sizeof(long));
It seems that it should be based on my previous statements, but it's not.
What's going on is that C pointer addition has a trick to it. Say I have a pointer p pointing to values of type T. The expression p + i does not mean "the position at p plus i bytes". Instead, the expression p + i actually means "the position at p plus i * sizeof(T) bytes".
The convenience of this is that to get "the next value" we just have to write p + 1 instead of p + 1 * sizeof(T).
This means that the C code long x = array[5]; is actually equivalent to
long x = *(array + 5)
because C will automatically multiply the 5 by sizeof(long).
So in the context of this StackOverflow question, how is this all relevant? It means that when I say "the address array + i * sizeof(long)", I do not mean for "array + i * sizeof(long)" to be interpreted as a C expression. I am doing the multiplication by sizeof(long) myself in order to make my answer more explicit, but understand that due to that, this expression should not be read as C. Just as normal math that uses C syntax.
[2] Side note: because all lea does is arithmetic operations, its arguments don't actually have to refer to valid addresses. For this reason, it's often used to perform pure arithmetic on values that may not be intended to be dereferenced. For instance, cc with -O2 optimization translates
long f(long x) {
return x * 5;
}
into the following (irrelevant lines removed):
f:
leaq (%rdi, %rdi, 4), %rax # set %rax to %rdi + %rdi * 4
ret
If you only specify a literal, there is no difference. LEA has more abilities, though, and you can read about them here:
http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_6/CH06-1.html#HEADING1-136
It depends on the used assembler, because
mov ax,table_addr
in MASM works as
mov ax,word ptr[table_addr]
So it loads the first bytes from table_addr and NOT the offset to table_addr. You should use instead
mov ax,offset table_addr
or
lea ax,table_addr
which works the same.
lea version also works fine if table_addr is a local variable e.g.
some_procedure proc
local table_addr[64]:word
lea ax,table_addr
As stated in the other answers:
MOV will grab the data at the address inside the brackets and place that data into the destination operand.
LEA will perform the calculation of the address inside the brackets and place that calculated address into the destination operand. This happens without actually going out to the memory and getting the data. The work done by LEA is in the calculating of the "effective address".
Because memory can be addressed in several different ways (see examples below), LEA is sometimes used to add or multiply registers together without using an explicit ADD or MUL instruction (or equivalent).
Since everyone is showing examples in Intel syntax, here are some in AT&T syntax:
MOVL 16(%ebp), %eax /* put long at ebp+16 into eax */
LEAL 16(%ebp), %eax /* add 16 to ebp and store in eax */
MOVQ (%rdx,%rcx,8), %rax /* put qword at rcx*8 + rdx into rax */
LEAQ (%rdx,%rcx,8), %rax /* put value of "rcx*8 + rdx" into rax */
MOVW 5(%bp,%si), %ax /* put word at si + bp + 5 into ax */
LEAW 5(%bp,%si), %ax /* put value of "si + bp + 5" into ax */
MOVQ 16(%rip), %rax /* put qword at rip + 16 into rax */
LEAQ 16(%rip), %rax /* add 16 to instruction pointer and store in rax */
MOVL label(,1), %eax /* put long at label into eax */
LEAL label(,1), %eax /* put the address of the label into eax */
Basically ... "Move into REG ... after computing it..."
it seems to be nice for other purposes as well :)
if you just forget that the value is a pointer
you can use it for code optimizations/minimization ...what ever..
MOV EBX , 1
MOV ECX , 2
;//with 1 instruction you got result of 2 registers in 3rd one ...
LEA EAX , [EBX+ECX+5]
EAX = 8
originaly it would be:
MOV EAX, EBX
ADD EAX, ECX
ADD EAX, 5
Lets understand this with a example.
mov eax, [ebx]
and
lea eax, [ebx]
Suppose value in ebx is 0x400000. Then mov will go to address 0x400000 and copy 4 byte of data present their to eax register.Whereas lea will copy the address 0x400000 into eax. So, after the execution of each instruction value of eax in each case will be (assuming at memory 0x400000 contain is 30).
eax = 30 (in case of mov)
eax = 0x400000 (in case of lea)
For definition mov copy the data from rm32 to destination (mov dest rm32) and lea(load effective address) will copy the address to destination (mov dest rm32).
MOV can do same thing as LEA [label], but MOV instruction contain the effective address inside the instruction itself as an immediate constant (calculated in advance by the assembler). LEA uses PC-relative to calculate the effective address during the execution of the instruction.
LEA (Load Effective Address) is a shift-and-add instruction. It was added to 8086 because hardware is there to decode and calculate adressing modes.
The difference is subtle but important. The MOV instruction is a 'MOVe' effectively a copy of the address that the TABLE-ADDR label stands for. The LEA instruction is a 'Load Effective Address' which is an indirected instruction, which means that TABLE-ADDR points to a memory location at which the address to load is found.
Effectively using LEA is equivalent to using pointers in languages such as C, as such it is a powerful instruction.

C Pointer to EFLAGS using NASM

For a task at my school, I need to write a C Program which does a 16 bit addition using an assembly programm. Besides the result the EFLAGS shall also be returned.
Here is my C Program:
int add(int a, int b, unsigned short* flags);//function must be used this way
int main(void){
unsigned short* flags = NULL;
printf("%d\n",add(30000, 36000, flags);// printing just the result
return 0;
}
For the moment the Program just prints out the result and not the flags because I am unable to get them.
Here is the assembly program for which I use NASM:
global add
section .text
add:
push ebp
mov ebp,esp
mov ax,[ebp+8]
mov bx,[ebp+12]
add ax,bx
mov esp,ebp
pop ebp
ret
Now this all works smoothly. But I have no idea how to get the pointer which must be at [ebp+16] pointing to the flag register. The professor said we will have to use the pushfd command.
My problem just lies in the assembly code. I will modify the C Program to give out the flags after I get the solution for the flags.
Normally you'd just use a debugger to look at flags, instead of writing all the code to get them into a C variable for a debug-print. Especially since decent debuggers decode the condition flags symbolically for you, instead of or as well as showing a hex value.
You don't have to know or care which bit in FLAGS is CF and which is ZF. (This information isn't relevant for writing real programs, either. I don't have it memorized, I just know which flags are tested by different conditions like jae or jl. Of course, it's good to understand that FLAGS are just data that you can copy around, save/restore, or even modify if you want)
Your function args and return value are int, which is 32-bit in the System V 32-bit x86 ABI you're using. (links to ABI docs in the x86 tag wiki). Writing a function that only looks at the low 16 bits of its input, and leaves high garbage in the high 16 bits of the output is a bug. The int return value in the prototype tells the compiler that all 32 bits of EAX are part of the return value.
As Michael points out, you seem to be saying that your assignment requires using a 16-bit ADD. That will produce carry, overflow, and other conditions with different inputs than if you looked at the full 32 bits. (BTW, this article explains carry vs. overflow very well.)
Here's what I'd do. Note the 32-bit operand size for the ADD.
global add
section .text
add:
push ebp
mov ebp,esp ; stack frames are optional, you can address things relative to ESP
mov eax, [ebp+8] ; first arg: No need to avoid loading the full 32 bits; the next insn doesn't care about the high garbage.
add ax, [ebp+12] ; low 16 bits of second arg. Operand-size implied by AX
cwde ; sign-extend AX into EAX
mov ecx, [ebp+16] ; the pointer arg
pushf ; the simple straightforward way
pop edx
mov [ecx], dx ; Store the low 16 of what we popped. Writing word [ecx] is optional, because dx implies 16-bit operand-size
; be careful not to do a 32-bit store here, because that would write outside the caller's object.
; mov esp,ebp ; redundant: ESP is still pointing at the place we pushed EBP, since the push is balanced by an equal-size pop
pop ebp
ret
CWDE (the 16->32 form of the 8086 CBW instruction) is not to be confused with CWD (the AX -> DX:AX 8086 instruction). If you're not using AX, then MOVSX / MOVZX are a good way to do this.
The fun way: instead of using the default operand size and doing 32-bit push and pop, we can do a 16-bit pop directly into the destination memory address. That would leave the stack unbalanced, so we could either uncomment the mov esp, ebp again, or use a 16-bit pushf (with an operand-size prefix, which according to the docs makes it only push the low 16 FLAGS, not the 32-bit EFLAGS.)
; What I'd *really* do: maximum efficiency if I had to use the 32-bit ABI with args on the stack, instead of args in registers
global add
section .text
add:
mov eax, [esp+4] ; first arg, first thing above the return address
add ax, [esp+8] ; second arg
cwde ; sign-extend AX into EAX
mov ecx, [esp+12] ; the pointer
pushfw ; push the low 16 of FLAGS
pop word [ecx] ; pop into memory pointed to by unsigned short* flags
ret
Both PUSHFW and POP WORD will assemble with an operand-size prefix. output from objdump -Mintel, which uses slightly different syntax from NASM:
4000c0: 66 9c pushfw
4000c2: 66 8f 01 pop WORD PTR [ecx]
PUSHFW is the same as o16 PUSHF. In NASM, o16 applies the operand-size prefix.
If you only needed the low 8 flags (not including OF), you could use LAHF to load FLAGS into AH and store that.
PUSHFing directly into the destination is not something I'd recommend. Temporarily pointing the stack at some random address is not safe in general. Programs with signal handlers will use the space below the stack asynchronously. This is why you have to reserve stack space before using it with sub esp, 32 or whatever, even if you're not going to make a function call that would overwrite it by pushing more stuff on the stack. The only exception is when you have a red-zone.
You C caller:
You're passing a NULL pointer, so of course your asm function segfaults. Pass the address of a local to give the function somewhere to store to.
int add(int a, int b, unsigned short* flags);
int main(void) {
unsigned short flags;
int result = add(30000, 36000, &flags);
printf("%d %#hx\n", result, flags);
return 0;
}
This is just a simple approach. I didn't test it, but you should get the idea...
Just set ESP to the pointer value, increment it by 2 (even for 32-bit arch) and PUSHF like this:
global add
section .text
add:
push ebp
mov ebp,esp
mov ax,[ebp+8]
mov bx,[ebp+12]
add ax,bx
; --- here comes the mod
mov esp, [ebp+16] ; this will set ESP to the pointers address "unsigned short* flags"
lea esp, [esp+2] ; adjust ESP to address above target
db 66h ; operand size prefix for 16-bit PUSHF (alternatively 'db 0x66', depending on your assembler
pushf ; this will save the lower 16-bits of EFLAGS to WORD PTR [EBP+16] = [ESP+2-2]
; --- here ends the mod
mov esp,ebp
pop ebp
ret
This should work, because PUSHF decrements ESP by 2 and then saves the value to WORD PTR [ESP]. Therefore it had to be increased before using the pointer address. Setting ESP to the appropriate value enables you to denominate the direct target of PUSHF.

Copying to and Displaying an Array

Hello Everyone!
I'm a newbie at NASM and I just started out recently. I currently have a program that reserves an array and is supposed to copy and display the contents of a string from the command line arguments into that array.
Now, I am not sure if I am copying the string correctly as every time I try to display this, I keep getting a segmentation error!
This is my code for copying the array:
example:
%include "asm_io.inc"
section .bss
X: resb 50 ;;This is our array
~some code~
mov eax, dword [ebp+12] ; eax holds address of 1st arg
add eax, 4 ; eax holds address of 2nd arg
mov ebx, dword [eax] ; ebx holds 2nd arg, which is pointer to string
mov ecx, dword 0
;Where our 2nd argument is a string eg "abcdefg" i.e ebx = "abcdefg"
copyarray:
mov al, [ebx] ;; get a character
inc ebx
mov [X + ecx], al
inc ecx
cmp al, 0
jz done
jmp copyarray
My question is whether this is the correct method of copying the array and how can I display the contents of the array after?
Thank you!
The loop looks ok, but clunky. If your program is crashing, use a debugger. See the x86 for links and a quick intro to gdb for asm.
I think you're getting argv[1] loaded correctly. (Note that this is the first command-line arg, though. argv[0] is the command name.) https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames says ebp+12 is the usual spot for the 2nd arg to a 32bit functions that bother to set up stack frames.
Michael Petch commented on Simon's deleted answer that the asm_io library has print_int, print_string, print_char, and print_nl routines, among a few others. So presumably you a pointer to your buffer to one of those functions and call it a day. Or you could call sys_write(2) directly with an int 0x80 instruction, since you don't need to do any string formatting and you already have the length.
Instead of incrementing separately for two arrays, you could use the same index for both, with an indexed addressing mode for the load.
;; main (int argc ([esp+4]), char **argv ([esp+8]))
... some code you didn't show that I guess pushes some more stuff on the stack
mov eax, dword [ebp+12] ; load argv
;; eax + 4 is &argv[1], the address of the 1st cmdline arg (not counting the command name)
mov esi, dword [eax + 4] ; esi holds 2nd arg, which is pointer to string
xor ecx, ecx
copystring:
mov al, [esi + ecx]
mov [X + ecx], al
inc ecx
test al, al
jnz copystring
I changed the comments to say "cmdline arg", to distinguish between those and "function arguments".
When it doesn't cost any extra instructions, use esi for source pointers, edi for dest pointers, for readability.
Check the ABI for which registers you can use without saving/restoring (eax, ecx, and edx at least. That might be all for 32bit x86.). Other registers have to be saved/restored if you want to use them. At least, if you're making functions that follow the usual ABI. In asm you can do what you like, as long as you don't tell a C compiler to call non-standard functions.
Also note the improvement in the end of the loop. A single jnz to loop is more efficient than jz break / jmp.
This should run at one cycle per byte on Intel, because test/jnz macro-fuse into one uop. The load is one uop, and the store micro-fuses into one uop. inc is also one uop. Intel CPUs since Core2 are 4-wide: 4 uops issued per clock.
Your original loop runs at half that speed. Since it's 6 uops, it takes 2 clock cycles to issue an iteration.
Another hacky way to do this would be to get the offset between X and ebx into another register, so one of the effective addresses could use a one-register addressing mode, even if the dest wasn't a static array.
mov [X + ebx + ecx], al. (Where ecx = X - start_of_src_buf). But ideally you'd make the store the one that used a one-register addressing mode, unless the load was a memory operand to an ALU instruction that could micro-fuse it. Where the dest is a static buffer, this address-different hack isn't useful at all.
You can't use rep string instructions (like rep movsb) to implement strcpy for implicit-length strings (C null-terminated, rather than with a separately-stored length). Well you could, but only scanning the source twice: once for find the length, again to memcpy.
To go faster than one byte clock, you'd have to use vector instructions to test for the null byte at any of 16 positions in parallel. Google up an optimized strcpy implementation for example. Probably using pcmpeqb against a vector register of all-zeros.

Using OFFSET operator on an array in x86 Assembly?

I'm currently going through Assembly Language for x86 Processors 6th Edition by Kip R. Irvine. It's quite enjoyable, but something is confusing me.
Early in the book, the following code is shown:
list BYTE 10,20,30,40
ListSize = ($ - list)
This made sense to me. Right after declaring an array, subtract the current location in memory with the starting location of the array to get the number of bytes used by the array.
However, the book later does:
.data
arrayB BYTE 10h,20h,30h
.code
mov esi, OFFSET arrayB
mov al,[esi]
inc esi
mov al,[esi]
inc esi
mov al,[esi]
To my understanding, OFFSET returns the location of the variable with respect to the program's segment. That address is stored in the esi register. Immediates are then used to access the value stored at the address represented in esi. Incrementing moves the address to the next byte.
So what is the difference between using OFFSET on an array and simply calling the array variable? I was previously lead to believe that simply calling the array variable would also give me its address.
.data
Number dd 3
.code
mov eax,Number
mov ebx,offset Number
EAX will read memory at a certain address and store the number 3
EBX will store that certain address.
mov ebx,offset Number
is equivalent in this case to
lea ebx,Number

Resources