8086 Data Segment MASM - masm

Given this data segment:
.data
vara dw 0AB0h
varb db 'C'
varc db 'DEF',0
vard db 65
vare db '90','$'
How do you find the offset into the Data Segment of the variable vard ?
How do you find how many bytes will have been written to the standard output device after all these instructions have been executed:
mov dx, offset varc
mov ah,9

How do you find the offset into the Data Segment of the variable vard ?
Just count all the data that preceeds the vard variable.
You have 1 word, 1 char, 3 chars and 1 byte.
7
How do you find how many bytes will have been written to the standard output device after all these instructions have been executed:
Again count all the data up until but not including the next $ sign.
You have 3 chars, 1 byte, 1 byte and 2 chars.
7

(1) The offset into the Data Segment of vard is: offset vard . Its value is resolved, and cannot be known until, link time.
(2) Assuming you are performing (though I see no int 21h in your question):
mov dx, offset varc
mov ah,9
int 21h
you will output seven bytes. All fields involved are byte so there will be no alignment padding intermixed, and the assembler/linker will not reorder variables, even if they stand alone (ie. are not embedded in a structure).

Related

Incrementing a pointer by 1 in an array of words: loading a word from half way between two words?

Consider the code below. If incrementing SI by 2 gives me the 2nd element of the array, what exactly would incrementing SI by 1 give me?
.data
var dw 1,2,3,4
.code
LEA SI,VAR
MOV AX,[SI]
INC SI
MOV AX,[SI]
Statement var dw 1,2,3,4 tells the assembler to statically define eight bytes in memory at the beginning of .data segment. Layout of the data bytes will be
|01|00|02|00|03|00|04|00|
and the first MOV AX,[SI] will load AL with 01 and AH with 00.
When you increment SI only by 1, the next MOV AX,[SI] will load AL with 00 and AH with 02.
If you want to keep loading AX with the whole 16bit words, increment SI by 2 (ADD SI,2).
You could also replace both MOV AX,[SI] and ADD SI,2 with one single instruction LODSW which does the same and occupies only one byte instead of five. In this case you should be sure to have the Direction flag reset (using the instruction CLD in the beginning of your program).

Insert values into array and display, nasm

First of all, this is a homework assignment.
I have a loop to get values of two digits individually, and joining them by doing a multiplication of the first digit by 10 and adding with the second digit to get an integer.
I'm doing all this and saving in my AL register, and now I want to insert that integer into an array and then scan that array and display those numbers.
How can I insert into vector and read from vector?
My array:
section .bss
array resb 200
My digit convert:
sub byte[digit_une], 30h
sub byte[digit_two], 30h
mov al, byte[digit_one]
mov dl, 10 ;dl = 10
mul dl ;al = ax = 10 * digit_one
add al, byte[digit_two] ;al = al + digit_two = digit_one * 10 + digit_two
"arrays", "vectors", etc... all that is higher level concept. The machine has memory, which is addressable by single byte, and what kind of logic you implement with your code, that's up to you. But you should be able to think about it on both levels, as single bytes in memory, each having it's own address, and fully understand your code logic, how it will arrange usage of those bytes to form "array of something".
With your definition of .bss sector you define one symbol/label array, which is equal to the address into memory where the .bss segment starts. Then you reserve 200 bytes of space, so anything else you will add after (like another label) will start at address .bss+200.
Let's say (for example) after loading your binary into memory and jumping to entry point, the .bss is at address 0x1000.
Then
mov dword [array],0x12345678
will store 4 bytes into memory at addresses 0x1000 .. 0x1003, with particular bytes having values 78 56 34 12 (little-endian break down of that dword value).
If you will do mov dword [array+199],0x12345678, you will write value 0x78 into the last officially reserved byte by that resb 200, and remaining 3 bytes will overwrite the memory at addresses .bss+200, .bss+201 and .bss+202 (probably damaging some other data, if you will put something there, or crashing your application, if it will cross the memory page boundary, and you are at the end of available memory for your process).
As you want to store N byte values into array, the simplest logic is to store first value at address array+0, second at array+1, etc... (for dword values the most logical way is array+0, array+4, array+8, ....).
i.e. mov [array+0],al can be used to store first value. But that's not very practical, if you are reading the input in some kind of loop. Let's say you want to read at most 200 values from user, or value 99 will end sooner, then you can use indexing by register, like:
xor esi,esi ; rsi = index = 0
mov ecx,200 ; rcx = 200 (max inputs)
input_loop:
; do input into AL = 0..99 integer (preserve RSI and RCX!)
...
cmp al,99
je input_loop_terminate
mov [array+rsi], al ; store the new value into array
inc rsi ; ++index
dec rcx ; --counter
jnz input_loop ; loop until counter is zero
input_loop_terminate:
; here RSI contains number of inputted values
; and memory from address array contains byte values (w/o the 99)
I.e. for user input 32, 72, 13, 0, 16, 99 the memory at address 0x1000 will have 5 bytes modified, containing (in hexa) now: 20 48 0D 00 10 ?? ?? ?? ....
If you are somewhat skilled asm programmer, you will not only index by register, but also avoid the hardcoded array label, so you would probably do an subroutine which takes as argument target address (of array), and maximum count:
; function to read user input, rsi = array address, rcx = max count
; does modify many other registers
; returns amount of inputted values in rax
take_some_byte_values_from_user:
jrcxz .error_zero_max_count ; validate count argument
lea rdi,[rsi+rcx] ; rdi = address of first byte beyond buffer
neg rcx ; rcx = -count (!)
; ^ small trick to make counter work also as index
; the index values will be: -200, -199, -198, ...
; and that's perfect for that "address of byte beyond buffer"
.input_loop:
; do input into AL = 0..99 integer (preserve RSI, RDI and RCX!)
...
cmp al,99
je .input_loop_terminate
mov [rdi+rcx], al ; store the new value into array
inc rcx ; ++counter (and index)
jnz .input_loop ; loop until counter is zero
.input_loop_terminate:
; calculate inputted size into RAX
lea rax,[rdi+rcx] ; address beyond last written value
sub rax,rsi ; rax = count of inputted values
ret
.error_zero_max_count:
xor eax,eax ; rax = 0, zero values were read
ret
Then you can call that subroutine from main code like this:
...
mov rsi,array ; rsi = address of reserved memory for data
mov ecx,200 ; rcx = max values count
call take_some_byte_values_from_user
; keep RAX (array.length = "0..200" value) somewhere
test al,al ; as 200 was max, testing only 8 bits is OK
jz no_input_from_user ; zero values were entered
...
For word/dword/qword element arrays the x86 has scaling factor in memory operand, so you can use index value going by +1, and address value like:
mov [array+4*rsi],eax ; store dword value into "array[rsi]"
For other sized elements it's usually more efficient to have pointer instead of index, and move to next element by doing add <pointer_reg>, <size_of_element> like add rdi,96, to avoid multiplication of index value for each access.
etc... reading values back is working in the same way, but reversed operands.
btw, these example don't as much "insert" values into array, as "overwrite" it. The computer memory already exists there and has some values (.bss gets zeroed by libc or OS IIRC? Otherwise some garbage may be there), so it's just overwriting old junk values with the values from user. There's still 200 bytes of memory "reserved" by resb, and your code must keep track of real size (count of inputted values) to know, where the user input ends, and where garbage data starts (or you may eventually write the 99 value into array too, and use that as "terminator" value, then you need only address of array to scan it content, and stop when value 99 is found).
EDIT:
And just in case you are still wondering why I am sometimes using square brackets and sometimes not, this Q+A looks detailed enough and YASM syntax is same as NASM in brackets usage: Basic use of immediates (square brackets) in x86 Assembly and yasm

File size in assembly

I have a following code written in TASM assembly for reading from a file and printing out the file content using a buffer.
Buffer declaration:
buffer db 100 dup (?), '$' ;regarding to comment, buffer is db 101 dup (?), '$'
EDIT
The structure of my program is:
Task 1 is asking me for a file name (string) which I want to read.
After I input file name, the procedure task1 opens the file.
mov ah, 3dh
xor al, al
lea dx, fname
int 21h ;open file
jc openError
mov bx, ax
Not sure, if opening the file is correct, because I have seen similar ways of opening the file but I do not have a handler here, or?
Here is the reading part task2:
task2 proc
pam 10,13 ;pam is macro for printing out
read:
mov ah, 3fh
lea dx, buffer
mov cx, 100
int 21h
jc readError ;read error , jump
mov si, ax
mov buffer[si], '$'
mov ah, 09h
int 21h ;print out
cmp si, 100
je read
jmp stop ;end
openError:
pam error1
jmp stop
readError:
pam error2
stop: ret
task2 endp
My question is, how can I get file length using this code? I have read that there are some ways of getting file size but they all look very complicated and I was thinking that when I read file, I should be able to calculate file size by storing number of characters I read in a register but I am not so sure about it and if it is possible, then I have no idea how to do that in tasm. Also in data segment, what variable do I need for storing file size? Maybe a code snippet would help me understand the process with some helpful comments how does it work. Thanks for help.
UPDATE regarding to the answer:
So I tried to convert hexa to decimal, it kinda works but I must have some bug in there because it works for small file, lets say I tried 1kB file and it worked, I got size in Bytes printed out on screen but when I tried bigger file like 128kB, decimal numbers were not correct - printed size was wrong, file is big exactly 130,862 bytes and my conversion gave me -- MENU653261 = Enter file name.
... code from the answer ...
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
;HEX TO DECIMAL = my code starts here
mov cx,0
mov bx,10
loop1: mov dx,0
div bx
add dl,30h
push dx
inc cx
cmp ax,9
jg loop1
add al,30h
mov [si],al
loop2: pop ax
inc si
mov [si],al
loop loop2
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
Here is a screen how it looks when the decimal value gets printed out. It is mixed with the next line. I tried to move it to the next line but did not help.
screenshot
A simple code to display hexa-formatted length of DOS file (file name is hardcoded in source, edit it to existing file):
.model small
.stack 100h
.data
fname DB "somefile.ext", 0
buffer DB 100 dup (?), '$'
.code
start:
; set up "ds" to point to data segment
mov ax,#data
mov ds,ax
; open file first, to get "file handle"
mov ax,3D00h ; ah = 3Dh (open file), al = 0 (read only mode)
lea dx,[fname] ; ds:dx = pointer to zero terminated file name string
int 21h ; call DOS service
jc fileError
; ax = file handle (16b number)
; now set the DOS internal "file pointer" to the end of opened file
mov bx,ax ; store "file handle" into bx
mov ax,4202h ; ah = 42h, al = 2 (END + cx:dx offset)
xor cx,cx ; cx = 0
xor dx,dx ; dx = 0 (cx:dx = +0 offset)
int 21h ; will set the file pointer to end of file, returns dx:ax
jc fileError ; something went wrong, just exit
; here dx:ax contains length of file (32b number)
; close the file, as we will not need it any more
mov cx,ax ; store lower word of length into cx for the moment
mov ah,3Eh ; ah = 3E (close file), bx is still file handle
int 21h ; close the file
; ignoring any error during closing, so not testing CF here
; BTW, int 21h modifies only the registers specified in documentation
; that's why keeping length in dx:cx registers is enough, avoiding memory/stack
; display dx:cx file length in hexa formatting to screen
; (note: yes, I used dx:cx for storage, not cx:dx as offset for 42h service)
; (note2: hexa formatting, because it's much easier to implement than decimal)
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
; exit to DOS with exit code 0 (OK)
mov ax,4C00h
int 21h
fileError:
mov ax,4C01h ; exit with code 1 (error happened)
int 21h
AxTo04Hex: ; subroutine to convert ax into four ASCII hexadecimal digits
; input: ax = 16b value to convert, ds:di = buffer to write characters into
; modifies: di += 4 (points beyond the converted four chars)
push cx ; save original cx to preserve it's value
mov cx,4
AxTo04Hex_singleDigitLoop:
rol ax,4 ; rotate whole ax content by 4 bits "up" (ABCD -> BCDA)
push ax
and al,0Fh ; keep only lowest nibble (4 bits) value (0-15)
add al,'0' ; convert it to ASCII: '0' to '9' and 6 following chars
cmp al,'9' ; if result is '0' to '9', just store it, otherwise fix
jbe AxTo04Hex_notLetter
add al,'A'-(10+'0') ; fix value 10+'0' into 10+'A'-10 (10-15 => 'A' to 'F')
AxTo04Hex_notLetter:
mov [di],al ; write ASCII hexa digit (0-F) to buffer
inc di
pop ax ; restore other bits of ax back for next loop
dec cx ; repeat for all four nibbles
jnz AxTo04Hex_singleDigitLoop
pop cx ; restore original cx value back
ret ; ax is actually back to it's input value here :)
end start
I tried to comment the code extensively, and to use "more straightforward" implementation of this stuff, avoiding some less common instructions, and keep the logic simple, so actually you should be able to comprehend how it works fully.
Again I strongly advise you to use debugger and go instruction by instruction slowly over it, watching how CPU state is changing, and how it correlates with my comments (note I'm trying to comment not what the instruction exactly does, as that can be found in instruction reference guide, but I'm trying to comment my human intention, why I wrote it there - in case of some mistake this gives you idea what should have been the correct output of the wrong code, and how to fix it. If comments just say what the instruction does, then you can't tell how it should be fixed).
Now if you would implement 32b_number_to_decimal_ascii formatting function, you can replace the last part of this example to get length in decimal, but that's too tricky for me to write from head, without proper debugging and testing.
Probably the simplest way which is reasonably to implement by somebody new to asm is to have table with 32b divisors for each 32b decimal digit and then do nested loop for each digits (probably skipping storage of leading zeroes, or just incrementing the pointer before printing to skip over them, that's even less complex logic of code).
Something like (pseudo code similar to C, hopefully showing the idea):
divisors dd 1000000000, 100000000, 10000000, ... 10, 1
for (i = 0; i < divisors.length; ++i) {
buffer[i] = '0';
while (divisors[i] <= number) {
number -= divisors[i];
++digit[i];
}
}
digit[i] = '$';
// then printing as
ptr_to_print = buffer;
// eat leading zeroes
while ( '0' == ptr_to_print[0] ) ++ptr_to_print;
// but keep at least one zero, if the number itself was zero
if ('$' == ptr_to_print[0] ) --ptr_to_print;
print_it // dx = ptr_to_print, ah = 9, int 21h
And if you wonder, how do you subtract 32 bit numbers in 16 bit assembly, that's actually not that difficult (as 32b division):
; dx:ax = 32b number
; ds:si = pointer to memory to other 32b number (mov si,offset divisors)
sub ax,[si] ; subtract lower word, CF works as "borrow" flag
sbb dx,[si+2] ; subtract high word, using the "borrow" of SUB
; optionally: jc overflow
; you can do that "while (divisors[i] <= number)" above
; by subtracting first, and when overflow -> exit while plus
; add the divisor back (add + adc) (to restore "number")
Points to question update:
You don't convert hex to decimal (hex string is stored in buffer, you don't load anything from there). You convert value in ax to decimal. The ax contains low word of file length from previous hex conversion call. So for files of length up to 65535 (0xFFFF = maximum 16b unsigned integer) it may work. For longer files it will not, as upper word is in dx, which you just destroy by mov dx,0.
If you would actually keep dx as is, you would divide file length by 10, but for file with 655360+ length it would crash on divide error (overflow of quotient). As I wrote in my answer above, doing 32b / 16b division on 8086 is not trivial, and I'm not even sure what is the efficient way. I gave you hint about using table of 32b divisors, and doing the division by subtraction, but you went for DIV instead. That would need some sophisticated split of the original 32b value into smaller parts up to a point where you can use div bx=10 to extract particular digits. Like doing filelength/1e5 first, then calculate 32b remainder (0..99999) value, which can be actually divided by 10 even in 16b (99999/10 = 9999 (fits 16b), remainder 9).
Looks like you didn't understand why 128k file length needs 32 bits to store, and what are the effective ranges of various types of variables. 216 = 65536 (= 64ki) ... that how big your integers can get, before you run into problems. 128ki is two times over that => 16 bit is problem.
Funny thing... as you wrote "converting from hex to decimal", at first I though: what, you convert that hexa string into decimal string??? But actually that sounds doable with 16b math, to go through whole hexa number first picking up only 100 values (extracted from particular k*16n value), then in next iteration doing 101 counting, etc...
But that division by subtracting 32bit numbers from my previous answer should be much easier to do, and especially to comprehend, how it works.
You write the decimal string at address si, but I don't see how you set si, so it's probably pointing into your MENU string by accident, and you overwrite that memory (again using debugger, checking ds:si values to see what address is used, and using memory view to watch the memory content written would give you hint, what is the problem).
Basically you wasted many hours by not following my advices (learning debugging and understanding what I meant by 32b - 32b loop doing division), trying to copy some finished code from Internet. At least it looks like you can somewhat better connect it to your own code, but you are still missing obvious problems, like not setting si to point to destination for decimal string.
Maybe try to first to print all numbers from the file, and keep the size in hexa (at least try to figure out, why conversion to hexa is easy, and to decimal not). So you will have most of the task done, then you can play with the hardest part (32b to decimal in 16b asm).
BTW, just a day ago or so somebody had problem with doing addition/subtraction over 64b numbers in 16b assembly, so this answer may give you further hints, why doing those conversion by sub/add loops is not that bad idea, it's quite "simple" code if you get the idea how it works: https://stackoverflow.com/a/42645266/4271923

Using OFFSET operator on an array in x86 Assembly?

I'm currently going through Assembly Language for x86 Processors 6th Edition by Kip R. Irvine. It's quite enjoyable, but something is confusing me.
Early in the book, the following code is shown:
list BYTE 10,20,30,40
ListSize = ($ - list)
This made sense to me. Right after declaring an array, subtract the current location in memory with the starting location of the array to get the number of bytes used by the array.
However, the book later does:
.data
arrayB BYTE 10h,20h,30h
.code
mov esi, OFFSET arrayB
mov al,[esi]
inc esi
mov al,[esi]
inc esi
mov al,[esi]
To my understanding, OFFSET returns the location of the variable with respect to the program's segment. That address is stored in the esi register. Immediates are then used to access the value stored at the address represented in esi. Incrementing moves the address to the next byte.
So what is the difference between using OFFSET on an array and simply calling the array variable? I was previously lead to believe that simply calling the array variable would also give me its address.
.data
Number dd 3
.code
mov eax,Number
mov ebx,offset Number
EAX will read memory at a certain address and store the number 3
EBX will store that certain address.
mov ebx,offset Number
is equivalent in this case to
lea ebx,Number

How to find largest number in an array using NASM

i was doing a program in NASM(x86 assembly), in which user is asked to enter three 32 bit hex numbers(8 digit), which are further stored in an array and the program shows the number which is largest of them all. The program works fine, i.e. it shows the largest of the three numbers. But the problem is, that it shows only 16 bit (4 digit number) as output. For example, if i give three numbers as 11111111h,22222222h and 10000000h, the output comes out to be only 2222. This is the code.
section .data
msg db "Enter the number : ",10d,13d
msglen equ $-msg
show db "The greatest number is : ",10d,13d
showlen equ $-show
%macro display 2
mov eax,4
mov ebx,1
mov ecx,%1
mov edx,%2
int 80h
%endmacro
%macro input 2
mov eax,3
mov ebx,0
mov ecx,%1
mov edx,%2
int 80h
%endmacro
section .bss
large resd 12
num resd 3
section .text
global _start
_start:
mov esi,num
mov edi,3
; Now taking input
nxt_num:
display msg,msglen
input esi,12
add esi,12
dec edi
jnz nxt_num
mov esi,num
mov edi,3
add: mov eax,[esi]
jmp check
next: add esi,12
mov ebx,[esi]
CMP ebx,eax
jg add
check: dec edi
jnz next
mov [large],eax
display show,showlen
display large,12
;exit
mov eax,1
mov ebx,0
int 80h
I even tried changing reserved size of array from doubly byte to quad byte. But the result remains the same.
Also, when i execute the same code in NASM x86_64 assembly, only with the registers and the system calls changed (i.e. eax to rax, ebx to rcx, int 80h to syscall, etc) the output comes out to of 32 bits(8 digits). Why so?
I need help. Thank you. :)
In you little program , you're trying to move the Qword into a 32-bit register which can hold just 4bytes (DWord). Based on your response to Gunner I guess you're misunderstanding this concept.
Actually each byte is represented by 8bits.
a word is 2 bytes (16 bits)
a dword is 4 bytes (32 bits) which is the size of a register in a x86 arch.
So whenever you take a byte , its binary equivalent has always an 8bits size.
So the binary equivalent of "FF" in hex is 00001111.
In your program just try to print your number as a string instead of printing it through a register, you can simply do that by using the pointer to the memory address where you number is stored or simply by printing the input using printf.
P.S : the string should be in ASCII , so to display 11111111 it should be in memory as following 3131313131313131 .
The output 2222 is correct for a 32 bit register. Each number is 8 bits, 4 numbers = 8 * 4 = 32, the max a 32 bit register can hold. This is why if you change to 64 bit registers, the full number is printed. You will need to change the displayed number into a string to display the full number.

Resources