moving 8bit integers array to 32bit array assembly - arrays

I'm stuck on how you're supposed to take the decimal integers from an 8-bit BYTE array and somehow manage to move them into a 32-bit DWORD array within a loop. I know it has to do something with OFFSET and Movezx, but it's a little confusing to understand. Are there any helpful tips for a newbie to understand it?
EDIT:
For example:
Array1 Byte 2, 4, 6, 8, 10
.code
mov esi, OFFSET Array1
mov ecx, 5
L1:
mov al, [esi]
movzx eax, al
inc esi
Loop L1
Is this the right approach? Or am I doing it entirely wrong?
It's Assembly x86. (Using Visual Studios)

Your code is almost right. You managed to get the values from the byte array and to convert them to dword. Now you only have to put them in the dword array (which is even not defined in your program).
Anyway, here it is (FASM syntax):
; data definitions
Array1 db 2, 4, 6, 8, 10
Array2 rd 5 ; reserve 5 dwords for the second array.
; the code
mov esi, Array1
mov edi, Array2
mov ecx, 5
copy_loop:
movzx eax, byte [esi] ; this instruction assumes the numbers are unsigned.
; if the byte array contains signed numbers use
; "movsx"
mov [edi], eax ; store to the dword array
inc esi
add edi, 4 ; <-- notice, the next cell is 4 bytes ahead!
loop copy_loop ; the human-friendly labels will not affect the
; speed of the program.

Related

Assembly: Occurrence of Integers in Array

I'm writing a program in masm assembly to count and return the number of times integers appear in an array. I currently have the following code that allows me to populate an array with random integers. What I am struggling with is how to implement a counter that will store each occurrence of an integer at an index in the array. for instance, if the random array was [3,4,3,3,4,5,7,8], I would want to my count array to hold [3, 2, 1, 1, 1], as there are (three 3's, two 4's, etc).
I have the bounds of the random numbers fixed at 3/8 so I know they will be within this range. My current thinking is to compare each number to 3-8 as it is added, and increment my count array respectively. My main lack of understanding is how I can increment specific indices of the array. This code is how I am producing an array of random integers, with an idea of how I can begin to count integer occurrence, but I don't know if I am going in the right direction. Any advice?
push ebp
mov ebp, esp
mov esi, [ebp + 16] ; # holds array to store count of integer occurances
mov edi, [ebp + 12] ; # holds array to be populated with random ints
mov ecx, [ebp + 8] ; value of request in ecx
MakeArray:
mov eax, UPPER ; upper boundary for random num in array
sub eax, LOWER ; lower boundary for random num in array
inc eax
call RandomRange
add eax, LOWER
cmp eax, 3 ; Where I start to compare the random numbers added
je inc_3 ; current thought is it cmp to each num 3-8
mov [edi], eax ; put random number in array
add edi, 4 ; holds address of current element, moves to next element
loop fillArrLoop
inc_3: ; if random num == 3
inc esi ; holds address of count_array, increments count_array[0] to 1?
mov [edi], eax ; put random number in array to be displayed
add edi, 4 ; holds address of current element, moves to next element
loop MakeArray
My current thinking is to compare each number to 3-8 as it is added
No, you're vastly overcomplicating this. You don't want to linear search for a j (index into the counts) such that arr[i] == j, just use j = arr[i].
The standard way to do a histogram is ++counts[ arr[i] ]. In your case, you know the possible values are 3..8, so you can map an array value to a count bucket with arr[i] - 3, so you'll operate on counts[0..5]. A memory-destination add instruction with a scaled-index addressing mode can do this in one x86 instruction, given the element value in a register.
If the possible values are not contiguous, you'd normally use a hash table to map values to count buckets. You can think about this simple case as allowing a trivial hash function.
Since you're generating the random numbers to fill arr[i] at the same time as histograming, you can combine those two tasks, and instead of subtracting 3 just don't add it yet.
; inputs: unsigned len, int *values, int *counts
; outputs: values[0..len-1] filled with random numbers, counts[] incremented
; clobbers: EAX, ECX, EDX (not the other registers)
fill_array_and_counts:
push ebp
mov ebp, esp
push esi ; Save/restore the caller's ESI.
;; Irvine32 functions like RandomRange are special and don't clobber EAX, ECX, or EDX except as return values,
;; so we can use EDX and ECX even in a loop that makes a function call.
mov edi, [ebp + 16] ; int *counts ; assumed already zeroed?
mov edx, [ebp + 12] ; int *values ; output pointers
mov ecx, [ebp + 8] ; size_t length
MakeArray: ; do{
mov eax, UPPER - LOWER + 1 ; size of random range, calculated at assemble time
call RandomRange ; eax = 0 .. eax-1
add dword ptr [edi + eax*4], 1 ; ++counts[ randval ]
add eax, LOWER ; map 0..n to LOWER..UPPER
mov [edx], eax ; *values = randval+3
add edx, 4 ; values++
dec ecx
jnz MakeArray ; }while(--ecx);
pop edi ; restore call-preserved regs
pop ebp ; including tearing down the stack frame
ret
If the caller doesn't zero the counts array for you, you should do that yourself, perhaps with rep stosd with EAX=0 as a memset of ECX dword elements, and then reload EDI and ECX from the stack args.
I'm assuming UPPER and LOWER are assemble time constants like UPPER = 8 or LOWER equ 3, since you used all-upper-case names for them, and they're not function args. If that's the case, then there's no need to do the math at runtime, just let the assembler calculate UPPER - LOWER + 1 for you.
I avoided the loop instruction because it's slow, and doesn't do anything you can't do with other simple instructions.
One standard performance trick for histograms with only a few buckets is to have multiple arrays of counts and unroll over them: Methods to vectorise histogram in SIMD?. This hides the latency of store/reload when the same counter needs to be incremented several times in a row. Your random values will generally avoid long runs of the same value, though, so worst-case performance is avoided.
There might be something to gain from AVX2 for large arrays since there are only 6 possible buckets: Micro Optimization of a 4-bucket histogram of a large array or list. (And you could generate random numbers in SIMD vectors with an AVX2 xorshift128+ PRNG if you wanted.)
If your range is fixed (3-8), you have a fixed-length array that can hold your counts:
(index0:Count of 3),(index1:Count of 4)..(index5:Count of 8s)
Once you have an element from the random array, you just take that element and put it through a switch:
cmp 3, [element]
jne compare4
mov ebx, [countsArrayAddress] ; Can replace [countsArrayAddress] with [ebp + 16]
add ebx, 0 ; First index, can comment out this line
mov ecx, [ebx]
add ecx, 1 ; Increment count
mov [ebx], ecx ; Count at the zeroth offset is now incremented
compare4:
cmp 4, [element]
jne compare5
mov ebx, [countsArrayAddress]
add ebx, 4 ; Second index (1*4)
mov ecx, [ebx]
add ecx, 1
mov [ebx], ecx
...
Is this what you mean? I come from using fasm syntax but it looks pretty similar. The above block is a bit unoptimized, but think this shows how to build the counts array. The array has a fix length, which must be allocated, either on the stack (sub rsp the correct amount) or on the heap, i.e with heapalloc/malloc calls. (Edited, see you're using 32-bit registers)

ASSEMBLY - output an array with 32 bit register vs 16 bit

I'm was working on some homework to print out an array as it's sorting some integers from an array. I have the code working fine, but decided to try using EAX instead of AL in my code and ran into errors. I can't figure out why that is. Is it possible to use EAX here at all?
; This program sorts an array of signed integers, using
; the Bubble sort algorithm. It invokes a procedure to
; print the elements of the array before, the bubble sort,
; once during each iteration of the loop, and once at the end.
INCLUDE Irvine32.inc
.data
myArray BYTE 5, 1, 4, 2, 8
;myArray DWORD 5, 1, 4, 2, 8
currentArray BYTE 'This is the value of array: ' ,0
startArray BYTE 'Starting array. ' ,0
finalArray BYTE 'Final array. ' ,0
space BYTE ' ',0 ; BYTE
.code
main PROC
MOV EAX,0 ; clearing registers, moving 0 into each, and initialize
MOV EBX,0 ; clearing registers, moving 0 into each, and initialize
MOV ECX,0 ; clearing registers, moving 0 into each, and initialize
MOV EDX,0 ; clearing registers, moving 0 into each, and initialize
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET startArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
MOV ECX, lengthOf myArray ; load ECX with # of elements of array
DEC ECX ; decrement count by 1
L1:
PUSH ECX ; save outer loop count
MOV ESI, OFFSET myArray ; point to first value
L2:
MOV AL,[ESI] ; get array value
CMP [ESI+1], AL ; compare a pair of values
JGE L3 ; if [esi] <= [edi], don't exch
XCHG AL, [ESI+1] ; exchange the pair
MOV [ESI], AL
CALL printArray ; call printArray function
CALL crlf
L3:
INC ESI ; increment esi to the next value
LOOP L2 ; inner loop
POP ECX ; retrieve outer loop count
LOOP L1 ; else repeat outer loop
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET finalArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
CALL printArray
L4 : ret
exit
main ENDP
printArray PROC uses ESI ECX
;myArray loop
MOV ESI, OFFSET myArray ; address of myArray
MOV ECX, LENGTHOF myArray ; loop counter (5 values within array)
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET currentArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
L5 :
MOV AL, [ESI] ; add an integer into eax from array
CALL writeInt
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET space
CALL writeString
POP EDX ; restores the original edx register value
ADD ESI, TYPE myArray ; point to next integer
LOOP L5 ; repeat until ECX = 0
CALL crlf
RET
printArray ENDP
END main
END printArray
; output:
;Starting array. This is the value of array: +1 +5 +4 +2 +8
;This is the value of array: +1 +4 +5 +2 +8
;This is the value of array: +1 +4 +2 +5 +8
;This is the value of array: +1 +2 +4 +5 +8
;Final array. This is the value of array: +1 +2 +4 +5 +8
As you can see the output sorts the array just fine from least to greatest. I was trying to see if I could move AL into EAX, but that gave me a bunch of errors. Is there a work around for this so I can use a 32 bit register and get the same output?
Using EAX is definitely possible, in fact you already are. You asked "I was trying to see if I could move AL into EAX, but that gave me a bunch of errors." Think about what that means. EAX is the extended AX register, and AL is the lower partition of AX. Take a look at this diagram:image of EAX register
. As you can see, moving AL into EAX using perhaps the MOVZX instruction would simply put the value in AL into EAX and fill zeroes in from right to left. You'd be moving AL into AL, and setting the rest of EAX to 0. You could actually move everything into EAX and run the program just the same and there'd be no difference because it's using the same part of memory.
Also, why are you pushing and popping EAX so much? The only reason to push/pop things from the runtime stack is to recover them later, but you never do that, so you can just let whatever is in EAX at the time just die.
If you still want to do an 8-bit store, you need to use an 8-bit register. (AL is an 8-bit register. IDK why you mention 16 in the title).
x86 has widening loads (movzx and movsx), but integer stores from a register operand always take a register the same width as the memory operand. i.e. the way to store the low byte of EAX is with mov [esi], al.
In printArray, you should use movzx eax, byte ptr [esi] to zero-extend into EAX. (Or movsx to sign-extend, if you want to treat your numbers as int8_t instead of uint8_t.) This avoids needing the upper 24 bits of EAX to be zeroed.
BTW, your code has a lot of unnecessary instructions. e.g.
MOV EAX,0 ; clearing registers, moving 0 into each, and initialize
totally pointless. You don't need to "init" or "declare" a register before using it for the first time, if your first usage is write-only. What you do with EDX is amusing:
MOV EDX,0 ; clearing registers, moving 0 into each, and initialize
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET startArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
"Caller-saved" registers only have to be saved if you actually want the old value. I prefer the terms "call-preserved" and "call-clobbered". If writeString destroys its input register, then EDX holds an unknown value after the function returns, but that's fine. You didn't need the value anyway. (Actually I think Irvine32 functions at most destroy EAX.)
In this case, the previous instruction only zeroed the register (inefficiently). That whole block could be:
MOV EDX, OFFSET startArray ; load EDX with address of variable
CALL writeString ; print string
xor edx,edx ; edx = 0
Actually you should omit the xor-zeroing too, because you don't need it to be zeroed. You're not using it as counter in a loop or anything, all the other uses are write-only.
Also note that XCHG with memory has an implicit lock prefix, so it does the read-modify-write atomically (making it much slower than separate mov instructions to load and store).
You could load a pair of bytes using movzx eax, word ptr [esi] and use a branch to decide whether to rol ax, 8 to swap them or not. But store-forwarding stalls from byte stores forwarding to word loads isn't great either.
Anyway, this is getting way off topic from the title question, and this isn't codereview.SE.

Arrays in MASM Assembly (very confused beginner)

I have a pretty basic question:
How do you populate arrays in assembly? In high level programming languages you can use a for-loop to set a value to each index, but I'm not sure of how to accomplish the same thing assembly. I know this is wrong, but this is what I have:
ExitProcess PROTO
.data
warray WORD 1,2,3,4
darray DWORD ?
.code
main PROC
mov edi, OFFSET warray
mov esi, OFFSET darray
mov ecx, LENGTHOF warray
L1:
mov ax, [edi] ;i want to move a number from warray to ax
movzx esi,ax ;i want to move that number into darray...
add edi, TYPE warray ;this points to the next number?
loop L1
call ExitProcess
main ENDP
END
Each time the loop runs, ax will be overwritten with the value of the array's index, right? Instead how do I populate darray with the array elements from warray? Any help would be very much appreciated...I'm pretty confused.
There are more than one way to populate an array and your code is almost working. One way is to use counter in the indirect address so you don't have to modify destination and source array pointers each loop:
ExitProcess PROTO
.data
warray WORD 1,2,3,4
darray DWORD 4 dup (?) ; 4 elements
.code
main PROC
mov edi, OFFSET warray
mov esi, OFFSET darray
xor ecx, ecx ; clear counter
L1:
mov ax, [edi + ecx * 2] ; get number from warray
movzx [esi + ecx * 4], ax ; move number to darray
inc ecx ; increment counter
cmp ecx, LENGTHOF warray
jne L1
call ExitProcess
main ENDP
END
Of course this code could be modified to fill the array backwards to possibly save couple of bytes like you probably meant to do in your original code. Here is another way that has more compact loop:
ExitProcess PROTO
.data
warray WORD 1,2,3,4
darray DWORD 4 dup (?) ; 4 elements
.code
main PROC
mov edi, OFFSET warray
mov esi, OFFSET darray
mov ecx, LENGTHOF warray - 1 ; start from end of array
L1:
mov ax, [edi + ecx * 2] ; get number from warray
movzx [esi + ecx * 4], ax ; move number to darray
loop L1
; get and set element zero separately because loop terminates on ecx = 0:
mov ax, [edi]
movzx [esi], ax
call ExitProcess
main ENDP
END
You should also note that when working with arrays of the same type you can do simple copy very efficiently using repeat prefix with instructions like MOVSD:
ExitProcess PROTO
.data
array1 DWORD 1,2,3,4
array2 DWORD 4 dup (?)
.code
main PROC
mov esi, OFFSET array1 ; source pointer in esi
mov edi, OFFSET array2 ; destination in edi
mov ecx, LENGTHOF array1 ; number of dwords to copy
cld ; clear direction flag so that pointers are increasing
rep movsd ; copy ecx dwords
call ExitProcess
main ENDP
END
You're probably not "supposed to know" this, but anyway, there were (way back when) an instruction and an instruction prefix that were made to do exactly this.
Take a look here at this Microsoft page: HERE (click on it)
On that page scroll down until you find this phrase...
"...These instructions are remnants of the x86's CISC heritage and in recent processors are actually slower than the equivalent instructions written out the long way...."
What you do is...
Put the size of the array in Ecx
Point Edi at the start of the arry
Use the appropriate string instruction to populate it
The syntax (Masm/Tasm/etc.) will probably look something like this...
Mov Ecx, The_Length_Of_The_Array ;Figure this out somehow
Lea Edi, The_Target_You_Want_To_Fill ;Define this somewhere
Now, if you want to copy from one place to another, do this...
Lea Esi, The_Source_You_Want_To_Copy ;Whatever, define it
Cld ;This is the direction flag, make it inc
Rep Movsb ;Movsb means move byte for byte
Now, if you want to stuff the same value in each byte in the arrray do this...
Mov AL, The_Value_You_Want_To_Stuff ;Define this to your liking
Cld ;This is the direction flag, make it inc
Rep Stosb ;Stosb means store AL into each byte
Again, these instructions are, for reasons others will elucidate, not cool anymore and if you use them you will get cooties or something.
There are also string instructions for comparison, "Scanning", "Loading", and so on. They were once quite useful (and still are, but the "modern" gang today won't admit it) particularly with the Rep prefix added to them.
If this helps, but you need more detail, feel free to ask.

Printing Out a String Stored in an Array of DWORDS

I'm writing a program in Assembly that will Bubble Sort an Array of Strings. A zero length string terminates the array. I approached this by declaring a DWORD array, where the string var., that is a byte size, shall be stored. My main problem is not the bubble sort itself, but that strings that were stored in the array wasn't outputting completely.
To hopefully make it clear, here is my code:
.586
.MODEL FLAT
INCLUDE io.h ; header file for input/output
space equ 0
cr equ 0dh
.STACK 4096
.DATA
myStrings byte "Delts",0
byte "Abs",0
byte "Biceps",0
byte 0
labelStrOut byte "Output is: ", 0
stringOut dword 11 dup (?)
stringNum dword 0
stringArray dword 20 dup (?)
.CODE
_MainProc PROC
mov edi, offset myStrings
mov esi, offset stringArray
popltLp:
cmp BYTE PTR [edi], 0
jz popltDone
mov ebx, [edi]
mov DWORD PTR [esi], ebx
add esi, 4
inc stringNum
xor ecx, ecx
not ecx
xor al, al
repne scasb
jmp popltLp
popltDone:
xor edx, edx
lea esi, stringArray
mov ebx, DWORD PTR [esi]
mov stringOut, ebx
output labelStrOut, stringOut
add esi, 4
mov ebx, DWORD PTR [esi]
mov stringOut, ebx
output labelStrOut, stringOut
add esi, 4
mov ebx, DWORD PTR [esi]
mov stringOut, ebx
output labelStrOut, stringOut
outptDone:
mov eax, 0 ; exit with return code 0
ret
_MainProc ENDP
END ; end of source code
As can be seen, no Bubble Sorting is being done yet...
The lines below 'popltDone' is just me messing around to see if the strings carried over to the array just fine. However, when printed out on the screen, only 4 characters were just showing up! The entire string line was just not being printed out, which is currently driving me crazy. Can someone please tell me what I am doing wrong?
Thanks to anybody taking the time reading this.
The problem is, you aren't using string pointers correctly. Specifically, here's the code I'm referring to:
mov ebx, [edi]
mov DWORD PTR [esi], ebx
If you were to translate this into English, it would be something like this:
Move the 4 byte value pointed to by edi into ebx.
Move the value in ebx into the memory address pointed to by esi.
This is perfectly legal and may actually be what you want in some cases, but I'm guess this isn't one of them. The reason you are only seeing the first 4 characters when you output your array of strings is because you copied the literal string into your array. A DWORD is 4 bytes so you get the first 4 characters. Here's what I would write:
mov DWORD PTR [esi], edi
Which translates into:
Move the pointer value edi into the memory address pointed to by esi.
Now you have not an array of strings, but an array of string pointers. If you were to write your code in C, decompile it, this is most likely what you would see. Rewrite your comparison and output functions to work with the pointer to a string instead of the literal characters in the string and you'll fix your problem.

Assembly Homework Assignment

Question:
Write a procedure that performs simple encryption by rotating each plaintext byte a varying
number of positions in different directions. For example, in the following array that represents
the encryption key, a negative value indicates a rotation to the left and a positive value indicates
a rotation to the right. The integer in each position indicates the magnitude of the rotation:
key BYTE -2, 4, 1, 0, -3, 5, 2, -4, -4, 6
Code:
Include Irvine32.inc
.data
msg BYTE "Hello", 0
key BYTE -2, 4, 1, 0, -3, 5, 2, -4, -4, 6
.code
main proc
mov ecx, LENGTHOF key ;Loop Counter
mov edx, OFFSET msg ;EDX Holds msg and will Display it
mov esi, OFFSET key ;Point to first array element
mov ebx, 0 ;CMP number
top:
cmp [esi], ebx ;if esi < ebx
jl ShiftLeft ;jump to shift left
cmp [esi], ebx ;if esi > ebx
jg ShiftRight ;jump to shift right
cmp [esi], ebx
je NoShift
ShiftLeft:
mov cl, [esi]
SHL edx, cl
add esi, TYPE key
loop top
ShiftRight:
mov cl, [esi]
SHR edx, cl
add esi, TYPE key
loop top
NoShift:
add esi, TYPE key
loop top
call WriteString
invoke ExitProcess,0
main endp
end main
So I am having a few issues.
1.the cmp statements are going in reverse. So the first cmp should be cmping -2 and 0. -2 < 0, So it should be taking the jump and going to Shiftleft. However, it is doing the opposite and going to shiftright.
2. Am I incrementing to the next array index properly with the add esi, TYPE key line?
3. Am I understanding the question? I need to rotate my msg, "Hello", either left if the number in the array is negative, by that number, and to the right if the number is positive in the array.
Any help is great and thanks in advance.
In your comparison
cmp [esi], ebx
this does not compare [esi] with ebx. It compares ebx with [esi] which is why you think the branch decision is incorrect. As a point of technique, you don't need to test all three conditions, since the last one must be true.
cmp [esi], ebx
je NoShift
jl ShiftRight ;else fall thru to ShiftLeft
You are correctly moving the pointer to the next array position with
add esi, TYPE key
but you have forgotten to increment the message pointer edx.
I presume that you would encode the character by rotating the byte, not shifting it, and you mistakenly rotate the pointer instead of its target
mov cl, [esi]
rol BYTE PTR [edx], cl
There is another problem - you are using cx for the loop control, but you are overwriting cl (the l.s. 8 bits of cx) with the bit shift counter. I'll leave you to figure out how to get around that.
I didn't run you code so may analysis may be wrong.
The cmp [esi], ebx do compare [esi] with ebx of course (it is just a [esi]-ebx). The mistake here is that you are comparing DWORDs instead of byte. As a rule of thumb, always specify memory operand size: cmp BYTE PTR [esi], bl this way the assembler can tell you if you are doing wrong.
You made other mistakes as pointed out by #WeatherVan.
I'd like to add that you don't need to do any jump due to the way two complement works and the symmetry of rotations. Remember: jump is expensive arithmetic is not.
If you take the low 3 bit (since you are rotating 8 bit data and 2^3=8) of the key bytes, you can always rotate to the right. A rotation of, say, 2 on the left is a rotation of 6 on the right and 6 is the two complement of 2 in 3 bits.
I also think that you need to repeat the key on the message, i.e. if the message is longer than the key you need to reload it from the start.
Here a sample program in NASM that you can use to better understand my advices
BITS 32
GLOBAL _main
SECTION .data
msg db "Hello ",0
key db -2, 4, 1, 0, -3, 5, 2, -4, -4, 6
SECTION .text
_main:
mov esi, msg
.load_key:
mov ebx, 10 ;LENGTHOF key
mov edi, key
.loop:
mov cl, BYTE [edi] ;Key
lodsb ;Char of message
test al, al ;End of string?
jz .end
and cl, 07h ;We are working with 8 bit numbers, 3bits of operand are enough
ror al, cl ;Rotate
mov BYTE [esi-01h], al
inc edi ;Next char
dec ebx ;Key left
jnz .loop
jmp .load_key
.end:
ret

Resources