I'm in the process of learning Assembly language using NASM, and have run into a programming problem that I can't seem to figure out. The goal of the program is to solve this equation:
Picture of Equation
For those unable to see the photo, the equation says that for two arrays of length n, array a and array b, find: for i=0 to n-1, ((ai + 3) - (bi - 4))
I'm only supposed to use three general registers, and I've figured out a code sample I think could possibly work, but I keep running into comma and operand errors with lines 16 and 19. I understand that in order to iterate through the array you need to move a pointer to each index, but since both arrays are of different values (array 1 is dw and array 2 is db) I am unsure how to account for that. I'm still very new to Assembly, and any help or pointers would be appreciated.
Here is a picture of my current code:
Code Sample
segment .data
a dw 12, 14, 16 ; array of three values
b db 2, 4, 5 ; array of three values
n dw 3 ; length of both arrays
result dq 0 ; memory to result
segment .text
global main
main:
mov rax, 0
mov rbx, 0
mov rdx, 0
loop_start:
cmp rax, [n]
jge loop_end
add rbx, a[rax*4] ; adding element of a at current index to rbx
add rbx, 3 ; adding 3 to current index value of array a in rbx
add rdx, BYTE b[rax]
sub rdx, 4
sub rbx, [rdx]
add [result], rbx
xor rbx, rbx
xor rdx, rdx
add rax, 1
loop_end:
ret
You are using 16-bit and 8-bit data, but 64-bit registers. Generally speaking, the processor requires the same data size though out the operands of any single instruction.
cmp rax,[n] has varying data size, which is not allowed: rax is a 64-bit register, and [n] is a 16 bit data item. So, we can change this to cmp ax,[n], and now everything is 16-bit.
add rbx,a[rax*4] is also mixing different size operands (not allowed). rbx is 64-bits and a[] is 16-bits. You can change the register to bx and this will be allowed. But also let's note that *4 is too much it should be *2 since dw is 16-bit data (2-byte), not 32-bit (4-byte). Since you're clearing rbx, you don't need an add here you can simply mov.
add rdx, BYTE b[rax] is also mixing different sizes. rax is 64-bits wide whereas b[] is 8-bits wide. Use dl instead of rdx. There is nothing to add to with this so you should use a mov instead of add. Now that there's a value in dl, and you previously cleared rdx, you can switch to using dx (from dl) this will have the 16-bit value of b[i].
sub rbx, [rdx] has an erroneous deference. Here you just want to sub bx,dx.
You are not using the label loop_start, so there is no loop. (Add a backward branch at the end of the loop.)
...but since both arrays are of different values (array 1 is dw and array 2 is db) I am unsure how to account for that
Erik Eidt's answer explaines why you "keep running into comma and operand errors". Although you can revert to using the smaller registers (adding operand size prefixes), my answer takes a different approach.
The instruction set has the movzx (move with zero extension) and movsx (move with sign extension) instructions to deal with these varying sizes. See below how to use these.
I've applied a few changes too.
Don't miss an opportunity to simplify your calculation:
((a[i] + 3) - (b[i] - 4)) is equivalent to (a[i] - b[i] + 7)
None of these arrays is empty, so you can just put the loop condition below its body.
You can process the arrays starting at the end if it's convenient. The summation operation doesn't mind!
segment .data
a dw 12, 14, 16 ; array of three values
b db 2, 4, 5 ; array of three values
n dw 3 ; length of both arrays
result dq 0 ; memory to result
segment .text
global main
main:
movzx rcx, word [n]
loop_start:
movzx rax, word [a + rcx * 2 - 2]
movzx rbx, byte [b + rcx - 1]
lea rax, [rax + rbx + 7]
add [result], rax
dec rcx
jnz loop_start
ret
Please notice that the additional negative offsets - 2 and - 1 exist to compensate for the fact that the loop control takes on the values {3, 2, 1} when {2, 1, 0} would have been perfect. This does not introduce an extra displacement component to the instruction since the mention of the a and b arrays is in fact already the displacement.
Although this is tagged x86-64, you can write the whole thing using 32-bit registers and not require the REX prefixes. Same result.
segment .data
a dw 12, 14, 16 ; array of three values
b db 2, 4, 5 ; array of three values
n dw 3 ; length of both arrays
result dq 0 ; memory to result
segment .text
global main
main:
movzx ecx, word [n]
loop_start:
movzx eax, word [a + ecx * 2 - 2]
movzx ebx, byte [b + ecx - 1]
lea eax, [eax + ebx + 7]
add [result], eax
dec ecx
jnz loop_start
ret
Related
In my .bss section I declare array db 5, 4, 3, 2, 1
I then have a pointer ptr defined as %define ptr dword[ebp-8]
I would like to inc and dec this pointer at will to move from one element in the array to another, and then I would like to have to ability to inc the value in the array that the pointer is pointing to, and have this be updated in the array!
I move through the array with a loop in the form of:
mov ptr, array ; not in loop
mov ebx, ptr
mov al, [ebx]
inc ptr
How can I increment the value and then have it saved in the array instead of just some register as if I did inc al , can I do something like inc [ptr] (This doesnt work ofcourse). Is there a better way to approach this entirely?
Thanks
Edit:
I want my array to be something like 10, 8, 6, 5, 2 i.e increment each element by however many
In my .bss section I declare array db 5, 4, 3, 2, 1
This does not make sense.
By default .bss is defined as nobits, i. e. an uninitialized section (although a modern multi-user OS will initialize it with some values [usually zero] to prevent data leaks).
You probably meant to write .data section.
nasm(1) will issue an “ignored” warning about any db in a nobits section.
I then have a pointer ptr defined as %define ptr dword[ebp‑8]
I would like to inc and dec this pointer […]
No, you have a (case-sensitive) single-line macro ptr.
Any (following) occurrence of ptr will be expanded to dword[ebp‑8].
Use nasm ‑E source.asm (preprocess only) to show this.
[…] then I would like to have to ability to inc the value in the array that the pointer is pointing to […]
Your ptr macro says it’s pointing to a dword – a 32‑bit quantity – but your array consists of db – data Byte, 8‑bit quantity – elements.
This doesn’t add up.
I want my array to be something like 10, 8, 6, 5, 2 i.e increment each element by however many
Well, x + x = 2 × x, calculating the sum of a value plus the same value is the same as multiplying by two.
Any decent compiler will optimize multiplying by a constant factor 2ⁿ as a shl x, n.
Unless you need certain flags (the resulting flags of shl and add marginally differ), you can do something like
lea ebx, [array] ; load address of start of `array`
xor eax, eax ; eax ≔ 0, index in array starting at 0
mov ecx, 5 ; number of elements
head_of_loop:
shl byte [ebx + 1 * eax], 1 ; base address + size of element * index
add eax, 1 ; eax ≔ eax + 1
loop head_of_loop ; ecx ≔ ecx − 1
; if ecx ≠ 0 then goto head_of_loop
The constant factor in [ebx + 1 * eax] can be 1, 2, 4 or 8.
Note, the loop instruction is utterly slow and was used just for the sake of simplicity.
I'm writing a program in masm assembly to count and return the number of times integers appear in an array. I currently have the following code that allows me to populate an array with random integers. What I am struggling with is how to implement a counter that will store each occurrence of an integer at an index in the array. for instance, if the random array was [3,4,3,3,4,5,7,8], I would want to my count array to hold [3, 2, 1, 1, 1], as there are (three 3's, two 4's, etc).
I have the bounds of the random numbers fixed at 3/8 so I know they will be within this range. My current thinking is to compare each number to 3-8 as it is added, and increment my count array respectively. My main lack of understanding is how I can increment specific indices of the array. This code is how I am producing an array of random integers, with an idea of how I can begin to count integer occurrence, but I don't know if I am going in the right direction. Any advice?
push ebp
mov ebp, esp
mov esi, [ebp + 16] ; # holds array to store count of integer occurances
mov edi, [ebp + 12] ; # holds array to be populated with random ints
mov ecx, [ebp + 8] ; value of request in ecx
MakeArray:
mov eax, UPPER ; upper boundary for random num in array
sub eax, LOWER ; lower boundary for random num in array
inc eax
call RandomRange
add eax, LOWER
cmp eax, 3 ; Where I start to compare the random numbers added
je inc_3 ; current thought is it cmp to each num 3-8
mov [edi], eax ; put random number in array
add edi, 4 ; holds address of current element, moves to next element
loop fillArrLoop
inc_3: ; if random num == 3
inc esi ; holds address of count_array, increments count_array[0] to 1?
mov [edi], eax ; put random number in array to be displayed
add edi, 4 ; holds address of current element, moves to next element
loop MakeArray
My current thinking is to compare each number to 3-8 as it is added
No, you're vastly overcomplicating this. You don't want to linear search for a j (index into the counts) such that arr[i] == j, just use j = arr[i].
The standard way to do a histogram is ++counts[ arr[i] ]. In your case, you know the possible values are 3..8, so you can map an array value to a count bucket with arr[i] - 3, so you'll operate on counts[0..5]. A memory-destination add instruction with a scaled-index addressing mode can do this in one x86 instruction, given the element value in a register.
If the possible values are not contiguous, you'd normally use a hash table to map values to count buckets. You can think about this simple case as allowing a trivial hash function.
Since you're generating the random numbers to fill arr[i] at the same time as histograming, you can combine those two tasks, and instead of subtracting 3 just don't add it yet.
; inputs: unsigned len, int *values, int *counts
; outputs: values[0..len-1] filled with random numbers, counts[] incremented
; clobbers: EAX, ECX, EDX (not the other registers)
fill_array_and_counts:
push ebp
mov ebp, esp
push esi ; Save/restore the caller's ESI.
;; Irvine32 functions like RandomRange are special and don't clobber EAX, ECX, or EDX except as return values,
;; so we can use EDX and ECX even in a loop that makes a function call.
mov edi, [ebp + 16] ; int *counts ; assumed already zeroed?
mov edx, [ebp + 12] ; int *values ; output pointers
mov ecx, [ebp + 8] ; size_t length
MakeArray: ; do{
mov eax, UPPER - LOWER + 1 ; size of random range, calculated at assemble time
call RandomRange ; eax = 0 .. eax-1
add dword ptr [edi + eax*4], 1 ; ++counts[ randval ]
add eax, LOWER ; map 0..n to LOWER..UPPER
mov [edx], eax ; *values = randval+3
add edx, 4 ; values++
dec ecx
jnz MakeArray ; }while(--ecx);
pop edi ; restore call-preserved regs
pop ebp ; including tearing down the stack frame
ret
If the caller doesn't zero the counts array for you, you should do that yourself, perhaps with rep stosd with EAX=0 as a memset of ECX dword elements, and then reload EDI and ECX from the stack args.
I'm assuming UPPER and LOWER are assemble time constants like UPPER = 8 or LOWER equ 3, since you used all-upper-case names for them, and they're not function args. If that's the case, then there's no need to do the math at runtime, just let the assembler calculate UPPER - LOWER + 1 for you.
I avoided the loop instruction because it's slow, and doesn't do anything you can't do with other simple instructions.
One standard performance trick for histograms with only a few buckets is to have multiple arrays of counts and unroll over them: Methods to vectorise histogram in SIMD?. This hides the latency of store/reload when the same counter needs to be incremented several times in a row. Your random values will generally avoid long runs of the same value, though, so worst-case performance is avoided.
There might be something to gain from AVX2 for large arrays since there are only 6 possible buckets: Micro Optimization of a 4-bucket histogram of a large array or list. (And you could generate random numbers in SIMD vectors with an AVX2 xorshift128+ PRNG if you wanted.)
If your range is fixed (3-8), you have a fixed-length array that can hold your counts:
(index0:Count of 3),(index1:Count of 4)..(index5:Count of 8s)
Once you have an element from the random array, you just take that element and put it through a switch:
cmp 3, [element]
jne compare4
mov ebx, [countsArrayAddress] ; Can replace [countsArrayAddress] with [ebp + 16]
add ebx, 0 ; First index, can comment out this line
mov ecx, [ebx]
add ecx, 1 ; Increment count
mov [ebx], ecx ; Count at the zeroth offset is now incremented
compare4:
cmp 4, [element]
jne compare5
mov ebx, [countsArrayAddress]
add ebx, 4 ; Second index (1*4)
mov ecx, [ebx]
add ecx, 1
mov [ebx], ecx
...
Is this what you mean? I come from using fasm syntax but it looks pretty similar. The above block is a bit unoptimized, but think this shows how to build the counts array. The array has a fix length, which must be allocated, either on the stack (sub rsp the correct amount) or on the heap, i.e with heapalloc/malloc calls. (Edited, see you're using 32-bit registers)
Say I have an array defined by:
array DW 1,1,3,0,3,3,4,4,-1
The array is terminated by -1, how would I be able to sort the array in pairs of descending order based on the first number in the pair (if first number is the same then it's sorted by the second number) as such:
4, 4; 3, 3; 3, 0; 1, 1;
array DW 1,1, 3,0, 3,3, 4,4, -1
The first number in each pair of word-sized numbers is the most significant for your task.
Each of these pairs can be seen as a dword, but on x86 (little endian) the first word will be the least significant. That's just the opposite of what you need. What if you temporarily swapped the words? Then you could sort the array as normal dwords.
Swap
Sort these dwords normally. (Beware terminator is still word)
Swap.
This could be the swap procedure (32-bit):
Swap:
mov ebx, array
jmp First
Next:
rol dword [ebx], 16
add ebx, 4
First:
cmp word [ebx], -1
jne Next
ret
This could be the swap procedure (16-bit):
Swap:
mov bx, array
jmp First
Next:
xchg ax, [bx+2]
mov [bx], ax
add bx, 4
First:
mov ax, [bx]
cmp ax, -1
jne Next
ret
A solution where you do these pre-swap and post-swap operations within the dword sorting algorithm would be just as easy.
I'm stuck on how you're supposed to take the decimal integers from an 8-bit BYTE array and somehow manage to move them into a 32-bit DWORD array within a loop. I know it has to do something with OFFSET and Movezx, but it's a little confusing to understand. Are there any helpful tips for a newbie to understand it?
EDIT:
For example:
Array1 Byte 2, 4, 6, 8, 10
.code
mov esi, OFFSET Array1
mov ecx, 5
L1:
mov al, [esi]
movzx eax, al
inc esi
Loop L1
Is this the right approach? Or am I doing it entirely wrong?
It's Assembly x86. (Using Visual Studios)
Your code is almost right. You managed to get the values from the byte array and to convert them to dword. Now you only have to put them in the dword array (which is even not defined in your program).
Anyway, here it is (FASM syntax):
; data definitions
Array1 db 2, 4, 6, 8, 10
Array2 rd 5 ; reserve 5 dwords for the second array.
; the code
mov esi, Array1
mov edi, Array2
mov ecx, 5
copy_loop:
movzx eax, byte [esi] ; this instruction assumes the numbers are unsigned.
; if the byte array contains signed numbers use
; "movsx"
mov [edi], eax ; store to the dword array
inc esi
add edi, 4 ; <-- notice, the next cell is 4 bytes ahead!
loop copy_loop ; the human-friendly labels will not affect the
; speed of the program.
I have a problem with asm code that works when mixed with C, but does not when used in asm code with proper parameters.
;; array - RDI, x- RSI, y- RDX
getValue:
mov r13, rsi
sal r13, $3
mov r14, rdx
sal r14, $2
mov r15, [rdi+r13]
mov rax, [r15+r14]
ret
Technically I want to keep the rdi, rsi and rdx registers untouched and thus I use other ones.
I am using an x64 machine and thus my pointers have 8 bytes. Technically speaking this code is supposed to do:
int getValue(int** array, int x, int y) {
return array[x][y];
}
it somehow works inside my C code, but does not when used in asm in this way:
mov rdi, [rdi] ;; get first pointer - first row
mov r9, $4 ;; we want second element from the row
mov rax, [rdi+r9] ;; get the element (4 bytes vs 8 bytes???)
mov rdi, FMT ;; prepare printf format "%d", 10, 0
mov rsi, rax ;; we want to print the element we just fetched
mov eax, $0 ;; say we have no non-integer argument
call printf ;; always gives 0, no matter what's in the matrix
Can someone see into this and help me? Thanks in advance.
The sal r14, $2 implies the elements are dwords, so the last line before the ret shouldn't load a qword. Besides, x86 has nice scaling addressing modes, so you can do this:
mov rax, [rdi + rsi * 8] ; load pointer to column
mov eax, [rax + rdx * 4] ; note this loads a dword
ret
That implies that you have an array of pointers to columns, which is unusual. You can do that, but was it intended?
This is a standard matrix of integers.
int** array;
sizeof(int*) == 8
sizeof(int) == 4
How I see it is that when I have that array at first, I have a pointer to a space of memory without "blanks" that holds all pointers one by one (index-by-index), so I say "let's go to the element rsi-th of the array" and that's why I shift by rsi-th * 8 bytes. So now I get the same situation, but the pointer should point to a space of integers, so 4-byte items. That's why I shift by 4 bytes there.
Is my thinking wrong?