Insert values into array and display, nasm - arrays

First of all, this is a homework assignment.
I have a loop to get values of two digits individually, and joining them by doing a multiplication of the first digit by 10 and adding with the second digit to get an integer.
I'm doing all this and saving in my AL register, and now I want to insert that integer into an array and then scan that array and display those numbers.
How can I insert into vector and read from vector?
My array:
section .bss
array resb 200
My digit convert:
sub byte[digit_une], 30h
sub byte[digit_two], 30h
mov al, byte[digit_one]
mov dl, 10 ;dl = 10
mul dl ;al = ax = 10 * digit_one
add al, byte[digit_two] ;al = al + digit_two = digit_one * 10 + digit_two

"arrays", "vectors", etc... all that is higher level concept. The machine has memory, which is addressable by single byte, and what kind of logic you implement with your code, that's up to you. But you should be able to think about it on both levels, as single bytes in memory, each having it's own address, and fully understand your code logic, how it will arrange usage of those bytes to form "array of something".
With your definition of .bss sector you define one symbol/label array, which is equal to the address into memory where the .bss segment starts. Then you reserve 200 bytes of space, so anything else you will add after (like another label) will start at address .bss+200.
Let's say (for example) after loading your binary into memory and jumping to entry point, the .bss is at address 0x1000.
Then
mov dword [array],0x12345678
will store 4 bytes into memory at addresses 0x1000 .. 0x1003, with particular bytes having values 78 56 34 12 (little-endian break down of that dword value).
If you will do mov dword [array+199],0x12345678, you will write value 0x78 into the last officially reserved byte by that resb 200, and remaining 3 bytes will overwrite the memory at addresses .bss+200, .bss+201 and .bss+202 (probably damaging some other data, if you will put something there, or crashing your application, if it will cross the memory page boundary, and you are at the end of available memory for your process).
As you want to store N byte values into array, the simplest logic is to store first value at address array+0, second at array+1, etc... (for dword values the most logical way is array+0, array+4, array+8, ....).
i.e. mov [array+0],al can be used to store first value. But that's not very practical, if you are reading the input in some kind of loop. Let's say you want to read at most 200 values from user, or value 99 will end sooner, then you can use indexing by register, like:
xor esi,esi ; rsi = index = 0
mov ecx,200 ; rcx = 200 (max inputs)
input_loop:
; do input into AL = 0..99 integer (preserve RSI and RCX!)
...
cmp al,99
je input_loop_terminate
mov [array+rsi], al ; store the new value into array
inc rsi ; ++index
dec rcx ; --counter
jnz input_loop ; loop until counter is zero
input_loop_terminate:
; here RSI contains number of inputted values
; and memory from address array contains byte values (w/o the 99)
I.e. for user input 32, 72, 13, 0, 16, 99 the memory at address 0x1000 will have 5 bytes modified, containing (in hexa) now: 20 48 0D 00 10 ?? ?? ?? ....
If you are somewhat skilled asm programmer, you will not only index by register, but also avoid the hardcoded array label, so you would probably do an subroutine which takes as argument target address (of array), and maximum count:
; function to read user input, rsi = array address, rcx = max count
; does modify many other registers
; returns amount of inputted values in rax
take_some_byte_values_from_user:
jrcxz .error_zero_max_count ; validate count argument
lea rdi,[rsi+rcx] ; rdi = address of first byte beyond buffer
neg rcx ; rcx = -count (!)
; ^ small trick to make counter work also as index
; the index values will be: -200, -199, -198, ...
; and that's perfect for that "address of byte beyond buffer"
.input_loop:
; do input into AL = 0..99 integer (preserve RSI, RDI and RCX!)
...
cmp al,99
je .input_loop_terminate
mov [rdi+rcx], al ; store the new value into array
inc rcx ; ++counter (and index)
jnz .input_loop ; loop until counter is zero
.input_loop_terminate:
; calculate inputted size into RAX
lea rax,[rdi+rcx] ; address beyond last written value
sub rax,rsi ; rax = count of inputted values
ret
.error_zero_max_count:
xor eax,eax ; rax = 0, zero values were read
ret
Then you can call that subroutine from main code like this:
...
mov rsi,array ; rsi = address of reserved memory for data
mov ecx,200 ; rcx = max values count
call take_some_byte_values_from_user
; keep RAX (array.length = "0..200" value) somewhere
test al,al ; as 200 was max, testing only 8 bits is OK
jz no_input_from_user ; zero values were entered
...
For word/dword/qword element arrays the x86 has scaling factor in memory operand, so you can use index value going by +1, and address value like:
mov [array+4*rsi],eax ; store dword value into "array[rsi]"
For other sized elements it's usually more efficient to have pointer instead of index, and move to next element by doing add <pointer_reg>, <size_of_element> like add rdi,96, to avoid multiplication of index value for each access.
etc... reading values back is working in the same way, but reversed operands.
btw, these example don't as much "insert" values into array, as "overwrite" it. The computer memory already exists there and has some values (.bss gets zeroed by libc or OS IIRC? Otherwise some garbage may be there), so it's just overwriting old junk values with the values from user. There's still 200 bytes of memory "reserved" by resb, and your code must keep track of real size (count of inputted values) to know, where the user input ends, and where garbage data starts (or you may eventually write the 99 value into array too, and use that as "terminator" value, then you need only address of array to scan it content, and stop when value 99 is found).
EDIT:
And just in case you are still wondering why I am sometimes using square brackets and sometimes not, this Q+A looks detailed enough and YASM syntax is same as NASM in brackets usage: Basic use of immediates (square brackets) in x86 Assembly and yasm

Related

Increment a value at address of an arrray in NASM assembly

In my .bss section I declare array db 5, 4, 3, 2, 1
I then have a pointer ptr defined as %define ptr dword[ebp-8]
I would like to inc and dec this pointer at will to move from one element in the array to another, and then I would like to have to ability to inc the value in the array that the pointer is pointing to, and have this be updated in the array!
I move through the array with a loop in the form of:
mov ptr, array ; not in loop
mov ebx, ptr
mov al, [ebx]
inc ptr
How can I increment the value and then have it saved in the array instead of just some register as if I did inc al , can I do something like inc [ptr] (This doesnt work ofcourse). Is there a better way to approach this entirely?
Thanks
Edit:
I want my array to be something like 10, 8, 6, 5, 2 i.e increment each element by however many
In my .bss section I declare array db 5, 4, 3, 2, 1
This does not make sense.
By default .bss is defined as nobits, i. e. an uninitialized section (although a modern multi-user OS will initialize it with some values [usually zero] to prevent data leaks).
You probably meant to write .data section.
nasm(1) will issue an “ignored” warning about any db in a nobits section.
I then have a pointer ptr defined as %define ptr dword[ebp‑8]
I would like to inc and dec this pointer […]
No, you have a (case-sensitive) single-line macro ptr.
Any (following) occurrence of ptr will be expanded to dword[ebp‑8].
Use nasm ‑E source.asm (preprocess only) to show this.
[…] then I would like to have to ability to inc the value in the array that the pointer is pointing to […]
Your ptr macro says it’s pointing to a dword – a 32‑bit quantity – but your array consists of db – data Byte, 8‑bit quantity – elements.
This doesn’t add up.
I want my array to be something like 10, 8, 6, 5, 2 i.e increment each element by however many
Well, x + x = 2 × x, calculating the sum of a value plus the same value is the same as multiplying by two.
Any decent compiler will optimize multiplying by a constant factor 2ⁿ as a shl x, n.
Unless you need certain flags (the resulting flags of shl and add marginally differ), you can do something like
lea ebx, [array] ; load address of start of `array`
xor eax, eax ; eax ≔ 0, index in array starting at 0
mov ecx, 5 ; number of elements
head_of_loop:
shl byte [ebx + 1 * eax], 1 ; base address + size of element * index
add eax, 1 ; eax ≔ eax + 1
loop head_of_loop ; ecx ≔ ecx − 1
; if ecx ≠ 0 then goto head_of_loop
The constant factor in [ebx + 1 * eax] can be 1, 2, 4 or 8.
Note, the loop instruction is utterly slow and was used just for the sake of simplicity.

Assembly: Occurrence of Integers in Array

I'm writing a program in masm assembly to count and return the number of times integers appear in an array. I currently have the following code that allows me to populate an array with random integers. What I am struggling with is how to implement a counter that will store each occurrence of an integer at an index in the array. for instance, if the random array was [3,4,3,3,4,5,7,8], I would want to my count array to hold [3, 2, 1, 1, 1], as there are (three 3's, two 4's, etc).
I have the bounds of the random numbers fixed at 3/8 so I know they will be within this range. My current thinking is to compare each number to 3-8 as it is added, and increment my count array respectively. My main lack of understanding is how I can increment specific indices of the array. This code is how I am producing an array of random integers, with an idea of how I can begin to count integer occurrence, but I don't know if I am going in the right direction. Any advice?
push ebp
mov ebp, esp
mov esi, [ebp + 16] ; # holds array to store count of integer occurances
mov edi, [ebp + 12] ; # holds array to be populated with random ints
mov ecx, [ebp + 8] ; value of request in ecx
MakeArray:
mov eax, UPPER ; upper boundary for random num in array
sub eax, LOWER ; lower boundary for random num in array
inc eax
call RandomRange
add eax, LOWER
cmp eax, 3 ; Where I start to compare the random numbers added
je inc_3 ; current thought is it cmp to each num 3-8
mov [edi], eax ; put random number in array
add edi, 4 ; holds address of current element, moves to next element
loop fillArrLoop
inc_3: ; if random num == 3
inc esi ; holds address of count_array, increments count_array[0] to 1?
mov [edi], eax ; put random number in array to be displayed
add edi, 4 ; holds address of current element, moves to next element
loop MakeArray
My current thinking is to compare each number to 3-8 as it is added
No, you're vastly overcomplicating this. You don't want to linear search for a j (index into the counts) such that arr[i] == j, just use j = arr[i].
The standard way to do a histogram is ++counts[ arr[i] ]. In your case, you know the possible values are 3..8, so you can map an array value to a count bucket with arr[i] - 3, so you'll operate on counts[0..5]. A memory-destination add instruction with a scaled-index addressing mode can do this in one x86 instruction, given the element value in a register.
If the possible values are not contiguous, you'd normally use a hash table to map values to count buckets. You can think about this simple case as allowing a trivial hash function.
Since you're generating the random numbers to fill arr[i] at the same time as histograming, you can combine those two tasks, and instead of subtracting 3 just don't add it yet.
; inputs: unsigned len, int *values, int *counts
; outputs: values[0..len-1] filled with random numbers, counts[] incremented
; clobbers: EAX, ECX, EDX (not the other registers)
fill_array_and_counts:
push ebp
mov ebp, esp
push esi ; Save/restore the caller's ESI.
;; Irvine32 functions like RandomRange are special and don't clobber EAX, ECX, or EDX except as return values,
;; so we can use EDX and ECX even in a loop that makes a function call.
mov edi, [ebp + 16] ; int *counts ; assumed already zeroed?
mov edx, [ebp + 12] ; int *values ; output pointers
mov ecx, [ebp + 8] ; size_t length
MakeArray: ; do{
mov eax, UPPER - LOWER + 1 ; size of random range, calculated at assemble time
call RandomRange ; eax = 0 .. eax-1
add dword ptr [edi + eax*4], 1 ; ++counts[ randval ]
add eax, LOWER ; map 0..n to LOWER..UPPER
mov [edx], eax ; *values = randval+3
add edx, 4 ; values++
dec ecx
jnz MakeArray ; }while(--ecx);
pop edi ; restore call-preserved regs
pop ebp ; including tearing down the stack frame
ret
If the caller doesn't zero the counts array for you, you should do that yourself, perhaps with rep stosd with EAX=0 as a memset of ECX dword elements, and then reload EDI and ECX from the stack args.
I'm assuming UPPER and LOWER are assemble time constants like UPPER = 8 or LOWER equ 3, since you used all-upper-case names for them, and they're not function args. If that's the case, then there's no need to do the math at runtime, just let the assembler calculate UPPER - LOWER + 1 for you.
I avoided the loop instruction because it's slow, and doesn't do anything you can't do with other simple instructions.
One standard performance trick for histograms with only a few buckets is to have multiple arrays of counts and unroll over them: Methods to vectorise histogram in SIMD?. This hides the latency of store/reload when the same counter needs to be incremented several times in a row. Your random values will generally avoid long runs of the same value, though, so worst-case performance is avoided.
There might be something to gain from AVX2 for large arrays since there are only 6 possible buckets: Micro Optimization of a 4-bucket histogram of a large array or list. (And you could generate random numbers in SIMD vectors with an AVX2 xorshift128+ PRNG if you wanted.)
If your range is fixed (3-8), you have a fixed-length array that can hold your counts:
(index0:Count of 3),(index1:Count of 4)..(index5:Count of 8s)
Once you have an element from the random array, you just take that element and put it through a switch:
cmp 3, [element]
jne compare4
mov ebx, [countsArrayAddress] ; Can replace [countsArrayAddress] with [ebp + 16]
add ebx, 0 ; First index, can comment out this line
mov ecx, [ebx]
add ecx, 1 ; Increment count
mov [ebx], ecx ; Count at the zeroth offset is now incremented
compare4:
cmp 4, [element]
jne compare5
mov ebx, [countsArrayAddress]
add ebx, 4 ; Second index (1*4)
mov ecx, [ebx]
add ecx, 1
mov [ebx], ecx
...
Is this what you mean? I come from using fasm syntax but it looks pretty similar. The above block is a bit unoptimized, but think this shows how to build the counts array. The array has a fix length, which must be allocated, either on the stack (sub rsp the correct amount) or on the heap, i.e with heapalloc/malloc calls. (Edited, see you're using 32-bit registers)

Adding 2D arrays in Assembly (x86)

I have to add two 3*3 arrays of words and store the result in another array. Here is my code:
.data
a1 WORD 1,2,3
WORD 4,2,3
WORD 1,4,3
a2 WORD 4, 3, 8
WORD 5, 6, 8
WORD 4, 8, 9
a3 WORD DUP 9(0)
.code
main PROC
mov eax,0;
mov ebx,0;
mov ecx,0;
mov edx,0;
mov edi,0;
mov esi,0;
mov edi,offset a1
mov esi,offset a2
mov ebx, offset a3
mov ecx,LENGTHOF a2
LOOP:
mov eax,[esi]
add eax,[edi]
mov [ebx], eax
inc ebx
inc esi
inc edi
call DumpRegs
loop LOOP
exit
main ENDP
END main
But this sums all elements of a2 and a1. How do I add them row by row and column by column? I want to display the result of sum of each row in another one dimensional array(Same for columns).
The
a1 WORD 1,2,3
WORD 4,2,3
WORD 1,4,3
will compile as bytes (in hexa):
01 00 02 00 03 00 04 00 02 00 03 00 01 00 04 00 03 00
Memory is addressable by bytes, so if you will find each element above, and count it's displacement from the first one (first one is displaced by 0 bytes, ie. it's address is a1+0), you should see a pattern, how to calculate the displacement of particular [y][x] element (x is column number 0-2, y is row number 0-2... if you decide so, it's up to you, what is column/row, but usually people tend to consider consecutive elements in memory to be "a row").
Pay attention to the basic types byte size, you are mixing it everywhere in every way, reread some lesson/tutorial about how qword/dword/word/byte differ and how you need to adjust your instructions to work with correct memory size, and how to calculate the address correctly (and what is the size of eax and how to use smaller parts of it).
If you have trouble to figure it on your own:
displacement = (y * 3 + x) * 2 => *2 because element is word, each occupies two bytes. y * 3 because single row is 3 elements long.
In ASM instructions that may be achieved for example...
If [x,y] is [eax,ebx], this calculation can be done as lea esi,[ebx+ebx*2] ; esi = y*3 | lea esi,[esi+eax] ; esi = y*3+x | mov ax,[a1+esi*2] ; loads [x,y] element from a1.
Now if you know how to calculate address of particular element, you can do either loop doing all the calculation ahead of each element load, or just do the math in head how the addresses differ and write the address calculation for first element (start of row/column) and then mov + 2x add with hardcoded offsets for next two elements (making loop for 3 elements is sort of more trouble than writing the unrolled code without loop), and repeat this for all three columns/rows and store the results.
BTW, that call DumpRegs ... is not producing what you expected? And it's a bit tedious way to debug the code, may be worth to spend a moment to get some debugger working.
Couldn't help my self, but to write it, as it's such funny short piece of code, but you will regret it later, if you will just copy it, and not dissect it to atoms and understand fully how it works):
column_sums: DW 0, 0, 0
row_sums: DW 0, 0, 0
...
; columns sums
lea esi,[a3] ; already summed elements of a1 + a2
lea edi,[column_sums]
mov ecx,3 ; three columns to sum
sum_column:
mov ax,[esi] ; first element of column
add ax,[esi+6] ; 1 line under first
add ax,[esi+12] ; 2 lines under
mov [edi],ax ; store result
add esi,2 ; next column, first element
add edi,2 ; next result
dec ecx
jnz sum_column
; rows sums
lea esi,[a3] ; already summed elements of a1 + a2
lea edi,[row_sums]
mov ecx,3 ; three rows to sum
sum_row:
mov ax,[esi] ; first element of row
add ax,[esi+2] ; +1 column
add ax,[esi+4] ; +2 column
mov [edi],ax ; store result
add esi,6 ; next row, first element
add edi,2 ; next result
dec ecx
jnz sum_row
...
(didn't debug it, so bugs are possible, plus this expect a3 to contain correct element sums, which your original code will not produce, so you have to fix it first ... this code does contain lot of hints, how to fix each problem of original)
Now I feel guilty of taking the fun of writing this from you... nevermind, I'm sure you can find few more tasks to practice this. The question is, whether you got the principle of it. If not, ask which part is confusing and how you currently understand it.
no no no no this top answer so terrible...
first we have a big memory access issue
change ur array access to be: "memtype ptr [memAddr + (index*memSize)]"
(): must be in a register of dword size im pretty sure, i know for a fact if its in a register it must be dword size, idk if u can do an expression like the way ive written it using the *...
memtype = byte, word (everything is a dword by default)
index = pos in array
memSize: byte = 1, word = 2, dword = 4
IF YOU DO NOT DO THIS, ALL MEMORY ACCESS WILL BE OF TYPE DWORD, AND YOU MIGHT ACCESS OUT OF BOUNDS AND MOST DEFINETELY YOU WILL NOT GET THE CORRECT VALUES BECAUSE IT IS MIXING MEMORY OF DIFFERENT THINGS( dword = word + word, so when u only want the word u have to do a word ptr, otheriwse it will give u the word+word and who knows what that value will be)
your type size is word, and your also trying to put it in a dword register, u can do the word size register of eax(ax) instead, or u can do movzx to place it in eax if you want to use the whole register
next accessing the array in different formats
i mean this part should be fairly obvious if you have done any basic coding, i think ur top error is the main issue
its just a normal array indexs: 0->?
so then you just access the addr [row * colSize + col]
and the way u progress your loop should be fairly self explanatory

File size in assembly

I have a following code written in TASM assembly for reading from a file and printing out the file content using a buffer.
Buffer declaration:
buffer db 100 dup (?), '$' ;regarding to comment, buffer is db 101 dup (?), '$'
EDIT
The structure of my program is:
Task 1 is asking me for a file name (string) which I want to read.
After I input file name, the procedure task1 opens the file.
mov ah, 3dh
xor al, al
lea dx, fname
int 21h ;open file
jc openError
mov bx, ax
Not sure, if opening the file is correct, because I have seen similar ways of opening the file but I do not have a handler here, or?
Here is the reading part task2:
task2 proc
pam 10,13 ;pam is macro for printing out
read:
mov ah, 3fh
lea dx, buffer
mov cx, 100
int 21h
jc readError ;read error , jump
mov si, ax
mov buffer[si], '$'
mov ah, 09h
int 21h ;print out
cmp si, 100
je read
jmp stop ;end
openError:
pam error1
jmp stop
readError:
pam error2
stop: ret
task2 endp
My question is, how can I get file length using this code? I have read that there are some ways of getting file size but they all look very complicated and I was thinking that when I read file, I should be able to calculate file size by storing number of characters I read in a register but I am not so sure about it and if it is possible, then I have no idea how to do that in tasm. Also in data segment, what variable do I need for storing file size? Maybe a code snippet would help me understand the process with some helpful comments how does it work. Thanks for help.
UPDATE regarding to the answer:
So I tried to convert hexa to decimal, it kinda works but I must have some bug in there because it works for small file, lets say I tried 1kB file and it worked, I got size in Bytes printed out on screen but when I tried bigger file like 128kB, decimal numbers were not correct - printed size was wrong, file is big exactly 130,862 bytes and my conversion gave me -- MENU653261 = Enter file name.
... code from the answer ...
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
;HEX TO DECIMAL = my code starts here
mov cx,0
mov bx,10
loop1: mov dx,0
div bx
add dl,30h
push dx
inc cx
cmp ax,9
jg loop1
add al,30h
mov [si],al
loop2: pop ax
inc si
mov [si],al
loop loop2
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
Here is a screen how it looks when the decimal value gets printed out. It is mixed with the next line. I tried to move it to the next line but did not help.
screenshot
A simple code to display hexa-formatted length of DOS file (file name is hardcoded in source, edit it to existing file):
.model small
.stack 100h
.data
fname DB "somefile.ext", 0
buffer DB 100 dup (?), '$'
.code
start:
; set up "ds" to point to data segment
mov ax,#data
mov ds,ax
; open file first, to get "file handle"
mov ax,3D00h ; ah = 3Dh (open file), al = 0 (read only mode)
lea dx,[fname] ; ds:dx = pointer to zero terminated file name string
int 21h ; call DOS service
jc fileError
; ax = file handle (16b number)
; now set the DOS internal "file pointer" to the end of opened file
mov bx,ax ; store "file handle" into bx
mov ax,4202h ; ah = 42h, al = 2 (END + cx:dx offset)
xor cx,cx ; cx = 0
xor dx,dx ; dx = 0 (cx:dx = +0 offset)
int 21h ; will set the file pointer to end of file, returns dx:ax
jc fileError ; something went wrong, just exit
; here dx:ax contains length of file (32b number)
; close the file, as we will not need it any more
mov cx,ax ; store lower word of length into cx for the moment
mov ah,3Eh ; ah = 3E (close file), bx is still file handle
int 21h ; close the file
; ignoring any error during closing, so not testing CF here
; BTW, int 21h modifies only the registers specified in documentation
; that's why keeping length in dx:cx registers is enough, avoiding memory/stack
; display dx:cx file length in hexa formatting to screen
; (note: yes, I used dx:cx for storage, not cx:dx as offset for 42h service)
; (note2: hexa formatting, because it's much easier to implement than decimal)
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
; exit to DOS with exit code 0 (OK)
mov ax,4C00h
int 21h
fileError:
mov ax,4C01h ; exit with code 1 (error happened)
int 21h
AxTo04Hex: ; subroutine to convert ax into four ASCII hexadecimal digits
; input: ax = 16b value to convert, ds:di = buffer to write characters into
; modifies: di += 4 (points beyond the converted four chars)
push cx ; save original cx to preserve it's value
mov cx,4
AxTo04Hex_singleDigitLoop:
rol ax,4 ; rotate whole ax content by 4 bits "up" (ABCD -> BCDA)
push ax
and al,0Fh ; keep only lowest nibble (4 bits) value (0-15)
add al,'0' ; convert it to ASCII: '0' to '9' and 6 following chars
cmp al,'9' ; if result is '0' to '9', just store it, otherwise fix
jbe AxTo04Hex_notLetter
add al,'A'-(10+'0') ; fix value 10+'0' into 10+'A'-10 (10-15 => 'A' to 'F')
AxTo04Hex_notLetter:
mov [di],al ; write ASCII hexa digit (0-F) to buffer
inc di
pop ax ; restore other bits of ax back for next loop
dec cx ; repeat for all four nibbles
jnz AxTo04Hex_singleDigitLoop
pop cx ; restore original cx value back
ret ; ax is actually back to it's input value here :)
end start
I tried to comment the code extensively, and to use "more straightforward" implementation of this stuff, avoiding some less common instructions, and keep the logic simple, so actually you should be able to comprehend how it works fully.
Again I strongly advise you to use debugger and go instruction by instruction slowly over it, watching how CPU state is changing, and how it correlates with my comments (note I'm trying to comment not what the instruction exactly does, as that can be found in instruction reference guide, but I'm trying to comment my human intention, why I wrote it there - in case of some mistake this gives you idea what should have been the correct output of the wrong code, and how to fix it. If comments just say what the instruction does, then you can't tell how it should be fixed).
Now if you would implement 32b_number_to_decimal_ascii formatting function, you can replace the last part of this example to get length in decimal, but that's too tricky for me to write from head, without proper debugging and testing.
Probably the simplest way which is reasonably to implement by somebody new to asm is to have table with 32b divisors for each 32b decimal digit and then do nested loop for each digits (probably skipping storage of leading zeroes, or just incrementing the pointer before printing to skip over them, that's even less complex logic of code).
Something like (pseudo code similar to C, hopefully showing the idea):
divisors dd 1000000000, 100000000, 10000000, ... 10, 1
for (i = 0; i < divisors.length; ++i) {
buffer[i] = '0';
while (divisors[i] <= number) {
number -= divisors[i];
++digit[i];
}
}
digit[i] = '$';
// then printing as
ptr_to_print = buffer;
// eat leading zeroes
while ( '0' == ptr_to_print[0] ) ++ptr_to_print;
// but keep at least one zero, if the number itself was zero
if ('$' == ptr_to_print[0] ) --ptr_to_print;
print_it // dx = ptr_to_print, ah = 9, int 21h
And if you wonder, how do you subtract 32 bit numbers in 16 bit assembly, that's actually not that difficult (as 32b division):
; dx:ax = 32b number
; ds:si = pointer to memory to other 32b number (mov si,offset divisors)
sub ax,[si] ; subtract lower word, CF works as "borrow" flag
sbb dx,[si+2] ; subtract high word, using the "borrow" of SUB
; optionally: jc overflow
; you can do that "while (divisors[i] <= number)" above
; by subtracting first, and when overflow -> exit while plus
; add the divisor back (add + adc) (to restore "number")
Points to question update:
You don't convert hex to decimal (hex string is stored in buffer, you don't load anything from there). You convert value in ax to decimal. The ax contains low word of file length from previous hex conversion call. So for files of length up to 65535 (0xFFFF = maximum 16b unsigned integer) it may work. For longer files it will not, as upper word is in dx, which you just destroy by mov dx,0.
If you would actually keep dx as is, you would divide file length by 10, but for file with 655360+ length it would crash on divide error (overflow of quotient). As I wrote in my answer above, doing 32b / 16b division on 8086 is not trivial, and I'm not even sure what is the efficient way. I gave you hint about using table of 32b divisors, and doing the division by subtraction, but you went for DIV instead. That would need some sophisticated split of the original 32b value into smaller parts up to a point where you can use div bx=10 to extract particular digits. Like doing filelength/1e5 first, then calculate 32b remainder (0..99999) value, which can be actually divided by 10 even in 16b (99999/10 = 9999 (fits 16b), remainder 9).
Looks like you didn't understand why 128k file length needs 32 bits to store, and what are the effective ranges of various types of variables. 216 = 65536 (= 64ki) ... that how big your integers can get, before you run into problems. 128ki is two times over that => 16 bit is problem.
Funny thing... as you wrote "converting from hex to decimal", at first I though: what, you convert that hexa string into decimal string??? But actually that sounds doable with 16b math, to go through whole hexa number first picking up only 100 values (extracted from particular k*16n value), then in next iteration doing 101 counting, etc...
But that division by subtracting 32bit numbers from my previous answer should be much easier to do, and especially to comprehend, how it works.
You write the decimal string at address si, but I don't see how you set si, so it's probably pointing into your MENU string by accident, and you overwrite that memory (again using debugger, checking ds:si values to see what address is used, and using memory view to watch the memory content written would give you hint, what is the problem).
Basically you wasted many hours by not following my advices (learning debugging and understanding what I meant by 32b - 32b loop doing division), trying to copy some finished code from Internet. At least it looks like you can somewhat better connect it to your own code, but you are still missing obvious problems, like not setting si to point to destination for decimal string.
Maybe try to first to print all numbers from the file, and keep the size in hexa (at least try to figure out, why conversion to hexa is easy, and to decimal not). So you will have most of the task done, then you can play with the hardest part (32b to decimal in 16b asm).
BTW, just a day ago or so somebody had problem with doing addition/subtraction over 64b numbers in 16b assembly, so this answer may give you further hints, why doing those conversion by sub/add loops is not that bad idea, it's quite "simple" code if you get the idea how it works: https://stackoverflow.com/a/42645266/4271923

Using OFFSET operator on an array in x86 Assembly?

I'm currently going through Assembly Language for x86 Processors 6th Edition by Kip R. Irvine. It's quite enjoyable, but something is confusing me.
Early in the book, the following code is shown:
list BYTE 10,20,30,40
ListSize = ($ - list)
This made sense to me. Right after declaring an array, subtract the current location in memory with the starting location of the array to get the number of bytes used by the array.
However, the book later does:
.data
arrayB BYTE 10h,20h,30h
.code
mov esi, OFFSET arrayB
mov al,[esi]
inc esi
mov al,[esi]
inc esi
mov al,[esi]
To my understanding, OFFSET returns the location of the variable with respect to the program's segment. That address is stored in the esi register. Immediates are then used to access the value stored at the address represented in esi. Incrementing moves the address to the next byte.
So what is the difference between using OFFSET on an array and simply calling the array variable? I was previously lead to believe that simply calling the array variable would also give me its address.
.data
Number dd 3
.code
mov eax,Number
mov ebx,offset Number
EAX will read memory at a certain address and store the number 3
EBX will store that certain address.
mov ebx,offset Number
is equivalent in this case to
lea ebx,Number

Resources