Using OFFSET operator on an array in x86 Assembly? - arrays

I'm currently going through Assembly Language for x86 Processors 6th Edition by Kip R. Irvine. It's quite enjoyable, but something is confusing me.
Early in the book, the following code is shown:
list BYTE 10,20,30,40
ListSize = ($ - list)
This made sense to me. Right after declaring an array, subtract the current location in memory with the starting location of the array to get the number of bytes used by the array.
However, the book later does:
.data
arrayB BYTE 10h,20h,30h
.code
mov esi, OFFSET arrayB
mov al,[esi]
inc esi
mov al,[esi]
inc esi
mov al,[esi]
To my understanding, OFFSET returns the location of the variable with respect to the program's segment. That address is stored in the esi register. Immediates are then used to access the value stored at the address represented in esi. Incrementing moves the address to the next byte.
So what is the difference between using OFFSET on an array and simply calling the array variable? I was previously lead to believe that simply calling the array variable would also give me its address.

.data
Number dd 3
.code
mov eax,Number
mov ebx,offset Number
EAX will read memory at a certain address and store the number 3
EBX will store that certain address.
mov ebx,offset Number
is equivalent in this case to
lea ebx,Number

Related

x86 Assembly Language: Filling an array with values generated by a subroutine

Edit: Thank you so much for all your help! I really appreciate it because for some reason I am having some trouble conceptualizing assembly but I'm piecing it together.
1) I am using the debugger to step through line by line. The problem, Unhandled exception at 0x0033366B in Project2.exe: 0xC0000005: Access violation writing location 0x00335FF8 occurs at this line:
mov [arrayfib + ebp*4], edx
Is think maybe this because the return statement from the other loop is not able to be accessed by the main procedure, but not sure - having a hard time understanding what is happening here.
2) I have added additional comments, hopefully making this somewhat clearer. For context: I've linked the model I've used to access the Fibonacci numbers, and my goal is to fill this array with the values, looping from last to first.
.386
.model flat, stdcall
.stack 4096
INCLUDE Irvine32.inc
ExitProcess PROTO, dwExitCode: DWORD
.data
arrayfib DWORD 35 DUP (99h)
COMMENT !
eax = used to calculate fibonacci numbers
edi = also used to calculate fibonacci numbers
ebp = number of fibonacci sequence being calculated
edx = return value of fibonacci number requested
!
.code
main PROC
;stores n'th value to calculate fibonacci sequence to
mov ebp, 30
mov edx, 0
;recursive call to fibonacci sequence procedure
call FibSequence
mov [arrayfib + ebp*4], edx
dec ebp
jnz FibSequence
mov esi, OFFSET arrayfib
mov ecx, LENGTHOF arrayfib
mov ebx, TYPE arrayfib
call DumpMem
INVOKE ExitProcess, 0
main ENDP
;initializes 0 and 1 as first two fibonacci numbers
FibSequence PROC
mov eax, 0
mov edi, 1
;subrracts 2 from fibonacci number to be calculated. if less than 0, jumps to finish,
;else adds two previous fibonacci numbers together
L1:
sub ebp, 2
cmp ebp, 0
js FINISH
add eax, edi
add edi, eax
LOOP L1
;stores odd fibonacci numbers
FINISH:
test eax, 1
jz FINISHEVEN
mov edx, eax
ret
;stores even fibonacci numbers
FINISHEVEN:
mov edx, edi
ret
FibSequence ENDP
END main
Your Fibonacci function destroys EBP, returning with it less than zero.
If your array is at the start of a page, then arrafib + ebp*4] will try to access the previous page and fault. Note the fault address of 0x00335FF8 - the last 3 hex digits are the offset within a 4k virtual page, an 0x...3FF8 = 0x...4000 + 4*-2.
So this is exactly what happened: EBP = -2 when your mov store executed.
(It's normal for function calls to destroy their register args in typical calling conventions, although using EBP for arg-passing is unusual. Normally on Windows you'd pass args in ECX and/or EDX, and return in EAX. Or on the stack, but that sucks for performance.)
(There's a lot of other stuff that doesn't make sense about your Fibonacci function too, e.g. I think you want jmp L1 not loop L1 which is like dec ecx / jnz L1 without setting flags).
In assembly language, every instruction has a specific effect on the architectural state (register values and memory contents). You can look up this effect in an instruction-set reference manual like https://www.felixcloutier.com/x86/index.html.
There is no "magic" that preserves registers for you. (Well, MASM will push/pop for you if you write stuff like proc foo uses EBP, but until you understand what that's doing for you it's better not to have the assembler adding instructions to you code.)
If you want a function to preserve its caller's EBP value, you need to write the function that way. (e.g. mov to copy the value to another register.) See What are callee and caller saved registers? for more about this idea.
maybe this because the return statement from the other loop is not able to be accessed by the main procedure
That doesn't make any sense.
ret is just how we write pop eip. There are no non-local effects, you just have to make sure that ESP is pointing to the address (pushed by call) that you want to jump to. Aka the return address.

Copying to and Displaying an Array

Hello Everyone!
I'm a newbie at NASM and I just started out recently. I currently have a program that reserves an array and is supposed to copy and display the contents of a string from the command line arguments into that array.
Now, I am not sure if I am copying the string correctly as every time I try to display this, I keep getting a segmentation error!
This is my code for copying the array:
example:
%include "asm_io.inc"
section .bss
X: resb 50 ;;This is our array
~some code~
mov eax, dword [ebp+12] ; eax holds address of 1st arg
add eax, 4 ; eax holds address of 2nd arg
mov ebx, dword [eax] ; ebx holds 2nd arg, which is pointer to string
mov ecx, dword 0
;Where our 2nd argument is a string eg "abcdefg" i.e ebx = "abcdefg"
copyarray:
mov al, [ebx] ;; get a character
inc ebx
mov [X + ecx], al
inc ecx
cmp al, 0
jz done
jmp copyarray
My question is whether this is the correct method of copying the array and how can I display the contents of the array after?
Thank you!
The loop looks ok, but clunky. If your program is crashing, use a debugger. See the x86 for links and a quick intro to gdb for asm.
I think you're getting argv[1] loaded correctly. (Note that this is the first command-line arg, though. argv[0] is the command name.) https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames says ebp+12 is the usual spot for the 2nd arg to a 32bit functions that bother to set up stack frames.
Michael Petch commented on Simon's deleted answer that the asm_io library has print_int, print_string, print_char, and print_nl routines, among a few others. So presumably you a pointer to your buffer to one of those functions and call it a day. Or you could call sys_write(2) directly with an int 0x80 instruction, since you don't need to do any string formatting and you already have the length.
Instead of incrementing separately for two arrays, you could use the same index for both, with an indexed addressing mode for the load.
;; main (int argc ([esp+4]), char **argv ([esp+8]))
... some code you didn't show that I guess pushes some more stuff on the stack
mov eax, dword [ebp+12] ; load argv
;; eax + 4 is &argv[1], the address of the 1st cmdline arg (not counting the command name)
mov esi, dword [eax + 4] ; esi holds 2nd arg, which is pointer to string
xor ecx, ecx
copystring:
mov al, [esi + ecx]
mov [X + ecx], al
inc ecx
test al, al
jnz copystring
I changed the comments to say "cmdline arg", to distinguish between those and "function arguments".
When it doesn't cost any extra instructions, use esi for source pointers, edi for dest pointers, for readability.
Check the ABI for which registers you can use without saving/restoring (eax, ecx, and edx at least. That might be all for 32bit x86.). Other registers have to be saved/restored if you want to use them. At least, if you're making functions that follow the usual ABI. In asm you can do what you like, as long as you don't tell a C compiler to call non-standard functions.
Also note the improvement in the end of the loop. A single jnz to loop is more efficient than jz break / jmp.
This should run at one cycle per byte on Intel, because test/jnz macro-fuse into one uop. The load is one uop, and the store micro-fuses into one uop. inc is also one uop. Intel CPUs since Core2 are 4-wide: 4 uops issued per clock.
Your original loop runs at half that speed. Since it's 6 uops, it takes 2 clock cycles to issue an iteration.
Another hacky way to do this would be to get the offset between X and ebx into another register, so one of the effective addresses could use a one-register addressing mode, even if the dest wasn't a static array.
mov [X + ebx + ecx], al. (Where ecx = X - start_of_src_buf). But ideally you'd make the store the one that used a one-register addressing mode, unless the load was a memory operand to an ALU instruction that could micro-fuse it. Where the dest is a static buffer, this address-different hack isn't useful at all.
You can't use rep string instructions (like rep movsb) to implement strcpy for implicit-length strings (C null-terminated, rather than with a separately-stored length). Well you could, but only scanning the source twice: once for find the length, again to memcpy.
To go faster than one byte clock, you'd have to use vector instructions to test for the null byte at any of 16 positions in parallel. Google up an optimized strcpy implementation for example. Probably using pcmpeqb against a vector register of all-zeros.

8086 assembly - how to access array elements within a loop

Ok, to make things as simple as possible, say I have a basic loop that i want to use in order to modify some elements of an array labeled a. In the following sample code I've tried replacing all elements of a with 1, but that doesn't really work.
assume cs:code ,ds:data
data segment
a db 1,2,3,4
i db 0
data ends
code segment
start:
mov ax,data
mov ds,ax
lea si,a
the_loop:
mov cl,i
cmp cl,4
jae the_end
mov ds:si[i],1 ; this is the part that i don't really understand since
inc i ; i'm expecting i=0 and ds:si[i] equiv to ds:si[0] which
loop the_loop ; is apparently not the case here since i actually receives the
; the value 1
the_end:
mov ax,4c00h
int 21h
code ends
end start
I am aware that I could simply do this by modifying the element stored in al after the lodsb instruction, and just store that. But I would like to know if it is possible to do something like what I've tried above.
In x86 assembly you can't use a value stored to a memory to address memory indirectly.
You need to read i into some register that can be used for memory addressing, and use that instead. You may want to check Wikipedia for 8086 memory addressing modes.
So, replace
mov ds:si[i],1
with (segment ds is unnecessary here, as it's the default of si, bx and bx+si too):
xor bx,bx
mov bl,[i]
mov [bx+si],byte 1 ; some other assemblers want byte ptr
There are other problems with your code too. The entire loop can be made easier and fixed this way:
lea si,a
xor cx,cx
mov cl,[i]
#fill_loop:
mov [si], byte 1
inc si
dec cx
jnz #fill_loop
Or, if you want to save 1 byte and use loop instruction.
#fill_loop:
mov [si], byte 1
inc si
loop #fill_loop
Note that in 16-bit mode loop instruction decrements cx and jumps to label if cx is not zero after decrement. However, in 32-bit mode loop decrements ecx and in 64-bit mode (x86-64) it decrements rcx.
I suppose that your code does not even run through the assembler, since
mov ds:si[i],1
is not a valid address mode.
Use something like
mov byte ptr [si],1 ; store value 1 at [SI]
inc si ; point to next array element
instead (used MASM to verify the syntax).
The DS: prefix is unnecessary for [si] since this is the default.
See also The 80x86 Addressing Modes.

Copying Array Content to Another Array in Assembly

I'm looking to copy some elements of an array to another one in Assembly. Both arrays are accessed via pointers which are stored in registers. So, edx would be pointing to one array and eax would point to another. Basically, edx points to an array of character read in from a text file, and I'd like eax to contain only 32 of the characters. Here's what I'm attempting to do:
I386 Assembly using NASM
add edx, 8 ; the first 8 characters of the string are not wanted
mov cl, 32
ip_address:
; move the character currently pointed to by edx to eax (mov [eax], [edx])
inc edx
inc eax
loop ip_address
Again, i'd like this to place the 32 characters after the first eight to be placed in the second array. The problem is that I'm stumped on how to do this.. Any help is very much appreciated.
You can't do direct memory-to-memory moves in x86. You need to use another scratch register:
mov ecx, [edx]
mov [eax], ecx
Or something like that...
Both ia32 and ia64 do contain a memory-to-memory string move instruction that can move bytes, "words", and "doublewords".
movsb
movsw
movsd
The source address is specified in ESI and the destination in EDI.1 By itself, it moves one byte, word, or doubleword. If the rep prefix is used, then ECX will contain a count and the instruction will move an entire string of values.
1. I think these instructions are the reason that the ESI and EDI registers are so named. (Source Index and Destination Index.)
The simple solution is to just do:
mov ebx, [edx]
mov [eax], ebx
Be aware that under many platform's ABIs, ebx is a callee-save register, so you will need to save and restore its value in your function.
The simpler solution is to link against the standard library and call memcpy, which is perfectly acceptable in assembly, and will usually be substantially faster than writing your own loop.

Accessing array members in assembler

I'm using C and ASM{} in one of our classes in order to complete an assembler project where we have to encrypt an input string, transmit and decrypt it.
The key is loaded into an empty C char array (20 chars long) and is then used later on with the XOR statement to encrypt.
The code to load the address of the array is:
lea esi, key
which puts the address of 'key' into esi. The address here is the same as the address of the key array when we examine the register in debug mode, so we know that works.
mov edx, [esi]
we thought this would move the value of esi's first index into edx, as we use "mov [esi], eax" to put the value of the a register into the esi array. The reverse seemed logical.
However, the value which edx actually gets is "875575655" (last run) or similar. Far too large for a char value.
I get the feeling we may be accessing the array wrong,
Any advice would be appreciated.
Note: Usually to increase the array index we're simply using inc esi and then read like we did above, this is how we were planning on reading from the array as well.
With mov edx, [esi] you read DWORD (because edx size is double word). Use mov dl, [esi] to read byte.
875575655 in hexa is 0x34303967, that would be string 'g904'.
To explain: while logically it's a character string, you can load more characters at once. So, instead of making 20 byte loads and xors, you can make 5 such operations on DWORD.

Resources