File size in assembly - file

I have a following code written in TASM assembly for reading from a file and printing out the file content using a buffer.
Buffer declaration:
buffer db 100 dup (?), '$' ;regarding to comment, buffer is db 101 dup (?), '$'
EDIT
The structure of my program is:
Task 1 is asking me for a file name (string) which I want to read.
After I input file name, the procedure task1 opens the file.
mov ah, 3dh
xor al, al
lea dx, fname
int 21h ;open file
jc openError
mov bx, ax
Not sure, if opening the file is correct, because I have seen similar ways of opening the file but I do not have a handler here, or?
Here is the reading part task2:
task2 proc
pam 10,13 ;pam is macro for printing out
read:
mov ah, 3fh
lea dx, buffer
mov cx, 100
int 21h
jc readError ;read error , jump
mov si, ax
mov buffer[si], '$'
mov ah, 09h
int 21h ;print out
cmp si, 100
je read
jmp stop ;end
openError:
pam error1
jmp stop
readError:
pam error2
stop: ret
task2 endp
My question is, how can I get file length using this code? I have read that there are some ways of getting file size but they all look very complicated and I was thinking that when I read file, I should be able to calculate file size by storing number of characters I read in a register but I am not so sure about it and if it is possible, then I have no idea how to do that in tasm. Also in data segment, what variable do I need for storing file size? Maybe a code snippet would help me understand the process with some helpful comments how does it work. Thanks for help.
UPDATE regarding to the answer:
So I tried to convert hexa to decimal, it kinda works but I must have some bug in there because it works for small file, lets say I tried 1kB file and it worked, I got size in Bytes printed out on screen but when I tried bigger file like 128kB, decimal numbers were not correct - printed size was wrong, file is big exactly 130,862 bytes and my conversion gave me -- MENU653261 = Enter file name.
... code from the answer ...
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
;HEX TO DECIMAL = my code starts here
mov cx,0
mov bx,10
loop1: mov dx,0
div bx
add dl,30h
push dx
inc cx
cmp ax,9
jg loop1
add al,30h
mov [si],al
loop2: pop ax
inc si
mov [si],al
loop loop2
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
Here is a screen how it looks when the decimal value gets printed out. It is mixed with the next line. I tried to move it to the next line but did not help.
screenshot

A simple code to display hexa-formatted length of DOS file (file name is hardcoded in source, edit it to existing file):
.model small
.stack 100h
.data
fname DB "somefile.ext", 0
buffer DB 100 dup (?), '$'
.code
start:
; set up "ds" to point to data segment
mov ax,#data
mov ds,ax
; open file first, to get "file handle"
mov ax,3D00h ; ah = 3Dh (open file), al = 0 (read only mode)
lea dx,[fname] ; ds:dx = pointer to zero terminated file name string
int 21h ; call DOS service
jc fileError
; ax = file handle (16b number)
; now set the DOS internal "file pointer" to the end of opened file
mov bx,ax ; store "file handle" into bx
mov ax,4202h ; ah = 42h, al = 2 (END + cx:dx offset)
xor cx,cx ; cx = 0
xor dx,dx ; dx = 0 (cx:dx = +0 offset)
int 21h ; will set the file pointer to end of file, returns dx:ax
jc fileError ; something went wrong, just exit
; here dx:ax contains length of file (32b number)
; close the file, as we will not need it any more
mov cx,ax ; store lower word of length into cx for the moment
mov ah,3Eh ; ah = 3E (close file), bx is still file handle
int 21h ; close the file
; ignoring any error during closing, so not testing CF here
; BTW, int 21h modifies only the registers specified in documentation
; that's why keeping length in dx:cx registers is enough, avoiding memory/stack
; display dx:cx file length in hexa formatting to screen
; (note: yes, I used dx:cx for storage, not cx:dx as offset for 42h service)
; (note2: hexa formatting, because it's much easier to implement than decimal)
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
; exit to DOS with exit code 0 (OK)
mov ax,4C00h
int 21h
fileError:
mov ax,4C01h ; exit with code 1 (error happened)
int 21h
AxTo04Hex: ; subroutine to convert ax into four ASCII hexadecimal digits
; input: ax = 16b value to convert, ds:di = buffer to write characters into
; modifies: di += 4 (points beyond the converted four chars)
push cx ; save original cx to preserve it's value
mov cx,4
AxTo04Hex_singleDigitLoop:
rol ax,4 ; rotate whole ax content by 4 bits "up" (ABCD -> BCDA)
push ax
and al,0Fh ; keep only lowest nibble (4 bits) value (0-15)
add al,'0' ; convert it to ASCII: '0' to '9' and 6 following chars
cmp al,'9' ; if result is '0' to '9', just store it, otherwise fix
jbe AxTo04Hex_notLetter
add al,'A'-(10+'0') ; fix value 10+'0' into 10+'A'-10 (10-15 => 'A' to 'F')
AxTo04Hex_notLetter:
mov [di],al ; write ASCII hexa digit (0-F) to buffer
inc di
pop ax ; restore other bits of ax back for next loop
dec cx ; repeat for all four nibbles
jnz AxTo04Hex_singleDigitLoop
pop cx ; restore original cx value back
ret ; ax is actually back to it's input value here :)
end start
I tried to comment the code extensively, and to use "more straightforward" implementation of this stuff, avoiding some less common instructions, and keep the logic simple, so actually you should be able to comprehend how it works fully.
Again I strongly advise you to use debugger and go instruction by instruction slowly over it, watching how CPU state is changing, and how it correlates with my comments (note I'm trying to comment not what the instruction exactly does, as that can be found in instruction reference guide, but I'm trying to comment my human intention, why I wrote it there - in case of some mistake this gives you idea what should have been the correct output of the wrong code, and how to fix it. If comments just say what the instruction does, then you can't tell how it should be fixed).
Now if you would implement 32b_number_to_decimal_ascii formatting function, you can replace the last part of this example to get length in decimal, but that's too tricky for me to write from head, without proper debugging and testing.
Probably the simplest way which is reasonably to implement by somebody new to asm is to have table with 32b divisors for each 32b decimal digit and then do nested loop for each digits (probably skipping storage of leading zeroes, or just incrementing the pointer before printing to skip over them, that's even less complex logic of code).
Something like (pseudo code similar to C, hopefully showing the idea):
divisors dd 1000000000, 100000000, 10000000, ... 10, 1
for (i = 0; i < divisors.length; ++i) {
buffer[i] = '0';
while (divisors[i] <= number) {
number -= divisors[i];
++digit[i];
}
}
digit[i] = '$';
// then printing as
ptr_to_print = buffer;
// eat leading zeroes
while ( '0' == ptr_to_print[0] ) ++ptr_to_print;
// but keep at least one zero, if the number itself was zero
if ('$' == ptr_to_print[0] ) --ptr_to_print;
print_it // dx = ptr_to_print, ah = 9, int 21h
And if you wonder, how do you subtract 32 bit numbers in 16 bit assembly, that's actually not that difficult (as 32b division):
; dx:ax = 32b number
; ds:si = pointer to memory to other 32b number (mov si,offset divisors)
sub ax,[si] ; subtract lower word, CF works as "borrow" flag
sbb dx,[si+2] ; subtract high word, using the "borrow" of SUB
; optionally: jc overflow
; you can do that "while (divisors[i] <= number)" above
; by subtracting first, and when overflow -> exit while plus
; add the divisor back (add + adc) (to restore "number")
Points to question update:
You don't convert hex to decimal (hex string is stored in buffer, you don't load anything from there). You convert value in ax to decimal. The ax contains low word of file length from previous hex conversion call. So for files of length up to 65535 (0xFFFF = maximum 16b unsigned integer) it may work. For longer files it will not, as upper word is in dx, which you just destroy by mov dx,0.
If you would actually keep dx as is, you would divide file length by 10, but for file with 655360+ length it would crash on divide error (overflow of quotient). As I wrote in my answer above, doing 32b / 16b division on 8086 is not trivial, and I'm not even sure what is the efficient way. I gave you hint about using table of 32b divisors, and doing the division by subtraction, but you went for DIV instead. That would need some sophisticated split of the original 32b value into smaller parts up to a point where you can use div bx=10 to extract particular digits. Like doing filelength/1e5 first, then calculate 32b remainder (0..99999) value, which can be actually divided by 10 even in 16b (99999/10 = 9999 (fits 16b), remainder 9).
Looks like you didn't understand why 128k file length needs 32 bits to store, and what are the effective ranges of various types of variables. 216 = 65536 (= 64ki) ... that how big your integers can get, before you run into problems. 128ki is two times over that => 16 bit is problem.
Funny thing... as you wrote "converting from hex to decimal", at first I though: what, you convert that hexa string into decimal string??? But actually that sounds doable with 16b math, to go through whole hexa number first picking up only 100 values (extracted from particular k*16n value), then in next iteration doing 101 counting, etc...
But that division by subtracting 32bit numbers from my previous answer should be much easier to do, and especially to comprehend, how it works.
You write the decimal string at address si, but I don't see how you set si, so it's probably pointing into your MENU string by accident, and you overwrite that memory (again using debugger, checking ds:si values to see what address is used, and using memory view to watch the memory content written would give you hint, what is the problem).
Basically you wasted many hours by not following my advices (learning debugging and understanding what I meant by 32b - 32b loop doing division), trying to copy some finished code from Internet. At least it looks like you can somewhat better connect it to your own code, but you are still missing obvious problems, like not setting si to point to destination for decimal string.
Maybe try to first to print all numbers from the file, and keep the size in hexa (at least try to figure out, why conversion to hexa is easy, and to decimal not). So you will have most of the task done, then you can play with the hardest part (32b to decimal in 16b asm).
BTW, just a day ago or so somebody had problem with doing addition/subtraction over 64b numbers in 16b assembly, so this answer may give you further hints, why doing those conversion by sub/add loops is not that bad idea, it's quite "simple" code if you get the idea how it works: https://stackoverflow.com/a/42645266/4271923

Related

Print from 1 to < user input in emu8086

I want to get a number (i.e 5) from the user and then print starting from 1 to < input (i.e 1 2 3 4)
But my code does not stop in "4" rather than the loop runs till "d"
I know that loop runs CX times
and as in 8086 MOVZX does not work that is why at first I moved AL to CL then zeroed the CH.
As someone mentioned that the problem is as I am moving AL to CX I'm not moving the value 4, I'm moving 34(ASCII value of 4) and so my loop runs 34 times.
Now how do I convert my user input value to decimal and move that to CX. Is there any way to take user input that will be stored in AL as decimal value?
org 100h
MOV AH, 1 ; Get user input
INT 21H
DEC AL ; Dec AL to satisfy the condition that it will print till < input
MOV BL,31H ; Initialize BL so that the output starts printing from 1
MOV CL,Al ; set counter register CX
MOV CH,00
Print:
MOV AH, 2 ; for output printing
MOV DL,0DH ; for output printing
INT 21H ; for output printing
MOV DL,0AH ; for output printing
INT 21H ; for output printing
MOV AH,2
MOV DL,BL ; print what is in BL
INT 21H
INC BL ; then increment BL
LOOP Print ; supposed to run the loop on Print what is the value in CL times
hlt
MOV AH, 1 ; Get user input
INT 21H
If you input 5 then the AL register will hold the number 35h which is the ASCII code of that key. You clearly want what that key represents which is 5. You need to subtract 30h (48).
mov ah, 01h ; DOS.GetKey
int 21h
sub al, '0'
dec al
mov cl, al
mov ch, 0
The rest of the program is fine for printing starting from 1 to < input.
Now how do I convert my user input value to decimal and move that to CX.
You've fallen into the trap of forgetting that loop conditions other than }while(--cx) are possible, using instructions other than loop.
loop is just a peephole optimization for dec cx / jnz (without affecting FLAGS). Only use it when that's actually the most efficient way to loop. (Or just never use it at all, because you need to understand conditional branches anyway so omitting loop is one fewer instruction to learn about / remember. Also, on most modern x86 CPUs, loop is much slower than dec/jnz. It's good if tuning for real 8086, or optimizing for code-size over speed, though. But only necessary as an optimization.
The easiest and most logically clear way to write this loop is:
MOV AH, 1 ; read a char from stdin into AL
INT 21H
mov cl, al ; ending character
mov bl, '1' ; b = current character, starting with '1'
.top: ; do {
... print CR/LF (not shown)
mov dl, bl
int 21h ; With AH=2 from printing CR/LF
inc bl ; b++
cmp bl, cl
jbe .top ; }while(b <= end_char);
Notice that I increment after printing. If you increment before printing, you'd use jb for }while(b < end_char).
On a real 8086, where loop is efficient, this does have more instructions and more code bytes inside the loop, and thus could be slower (if we consider a case where loop overhead matters, not with 3x slow int 21h system calls inside the loop).
But that trades off against smaller total code size (from the trivial loop setup). So it's a tradeoff between static code size vs. dynamic instruction count (and amount of code bytes that need to be fetched, which was the real issue on 8086).

Problems with printing the array.Assembly Language

I've got some problems with printing array on the screen.
User firstly have to enter elements(numbers) of array from the keyboard, but when i'm trying to print it i've got problems-- it prints different symbols(letters) ,as many as many numbers were entered by the user and then it loops.
MASS-- is our array
SUM- is some message,don't mind it/
OUT_ARRAY PROC NEAR;==============================OUT
OUT_AR:
MOV AH,02H
MOV DL,MASS[SI]
ADD DL,30H
INT 21H
INC SI
LOOP OUT_AR
XOR SI,SI
MOV AH,9
LEA DX,SUM
INT 21H
XOR DX,DX
XOR BX,BX
CYCLE:
XOR AX,AX
ADD DL,MASS[SI]
INC SI
INC BX
CMP SI,5
LOOP CYCLE
RET
OUT_ARRAY ENDP
p.s.
-i'm using emu 8086.
-If you already have some sample procedures ,which prints arrays,i'd like to have a look on them. and i'll be grateful to you.
Thanks!
OUT_AR:
MOV AH,02H
MOV DL,MASS[SI] ; Where is SI initialized? You should clear si before you enter the loop
ADD DL,30H
INT 21H
INC SI
LOOP OUT_AR ; CX is set the by the caller? If not it is also not initialized
The question you need to ask here is: Which values are actually printed? You are looping through the array and adding 0x30 to each value.
What does this mean? Obviously you know that the ASCII code for a '0'-'9' is 0x30 - 0x39 so you try to print each digit individually. But what values are stored in MASS?
If you have only values there between 0-9 then your algorithm works. Assuming that you have arbitrary values in that array in the range 0-255 then if the value is i.e. 0x60 and you add 0x30 to it, you are printing an accented 'E' (depending on your locales setting).
So what you need to do is, you must convert the number in the array to a decimal string, and only then can you print it properly.

Looping and processing string byte-by-byte in MASM assembly

I am using MASM assembly and I am trying to write a loop that processes the string str1 byte-by-byte, changing each lowercase letter into the corresponding capital letter using bit operations. If the letter is already capital, leave it alone. Nothing seems to happen to my string str1 when I execute my code and I'm having difficulty figuring out why, maybe I shouldn't be processing my array as such, but nonetheless, here's the code:
.386
.MODEL FLAT
str1 dword "aBcD", cr, Lf, 0
....
.code
_start:
output str1
**sub esi, esi ; sum = 0
lea ebx, str1
top: mov al, [ebx + esi] ; attempting to move each character value from
str1 into the register al for comparison and
possible conversion to uppercase
add esi, 5
cmp al, 0
je zero
sub al, 20h** ; convert lowercase to corresponding uppercase
loop top
zero: output zeromsg ; for TESTING of al purposes only
done: output str1value
output str1
Nothing changes , and on top of the conversion not taking place, the string it printing in reverse order. why? prints out as: "DcBa". Any inquiry would be appreciated! Thanks in advance.
You must load the character, process it, and store it back. You don't store it.
Something like:
mov [esi+ebx], al
is missing.
Why do you sub 0x20 from the char? And why do you add 5 to esi?
Update
Before you start coding, you should think about what the required steps are.
Load the character.
If the character is 0 the string is done.
If the character is uppercase, convert it
Store the character
Adavance to the next character and back to 1
That's it. Now when you look at your code example, you can easily see what is missing and where you go wrong.
May help you a bit
.writeLoop2
mov eax,[ebx] ;mov eax start of data block [ebx]
cmp al,&61 ;61hex is "a"
jb dontsub20 ;if its less don't bother because it's a CAPITAL letter
sub al,&20 ;else take off 20 hex
.dontsub20
call "osasci" ;print to screen command. Output the character in eax
inc ebx ;move ebx forward to next character
inc ecx ;ecx is the rolling count
cmp ecx,edx ;when ecx=edx we are at the end of the data block
jb writeLoop2 ;otherwise loop, there are more characters to print

Reading File Error.. Microsoft Assembly

I am workin on a pretty big program in Assembly
I have a bit of a problem in this specific piece of code
ToArray proc _FH:word ; _FH File Handler ;non-void function returns -1 if error
LOCALS
push AX BX CX
MOV BX, _FH
MOV CX, 400
MOV DX, offset FileBuffer
MOV AH, 3FH
INT 21H
JC ErrorReading
call puts, offset Read_Success
JMP DONE
ErrorReading:
call puts, offset Read_Error
MOV DX,-1
DONE:
pop CX BX AX
ret
ToArray endp
I have { 1 2 5 6 } in the opened file but after callin INT 21H it just fills the array with 80241 80241..
Why is this happening :?
from having 1 3 5 6 I have 8241 8243 8245...
That looks like correct data to me.The decimal numbers 8241 8243 8245 when viewed as hexadecimal would be 0x2031 0x2033 0x2035. 0x20 is the ascii code for the space character, 0x31 is the ascii code for '1', and so on. So you're looking at the string "1 3 5 ". It's just that you picked a representation of the data that makes this hard to see.Unless the file is using Unicode or some other multi-byte character encoding you're better off viewing the characters as bytes rather than words.

8086 assembly - how to access array elements within a loop

Ok, to make things as simple as possible, say I have a basic loop that i want to use in order to modify some elements of an array labeled a. In the following sample code I've tried replacing all elements of a with 1, but that doesn't really work.
assume cs:code ,ds:data
data segment
a db 1,2,3,4
i db 0
data ends
code segment
start:
mov ax,data
mov ds,ax
lea si,a
the_loop:
mov cl,i
cmp cl,4
jae the_end
mov ds:si[i],1 ; this is the part that i don't really understand since
inc i ; i'm expecting i=0 and ds:si[i] equiv to ds:si[0] which
loop the_loop ; is apparently not the case here since i actually receives the
; the value 1
the_end:
mov ax,4c00h
int 21h
code ends
end start
I am aware that I could simply do this by modifying the element stored in al after the lodsb instruction, and just store that. But I would like to know if it is possible to do something like what I've tried above.
In x86 assembly you can't use a value stored to a memory to address memory indirectly.
You need to read i into some register that can be used for memory addressing, and use that instead. You may want to check Wikipedia for 8086 memory addressing modes.
So, replace
mov ds:si[i],1
with (segment ds is unnecessary here, as it's the default of si, bx and bx+si too):
xor bx,bx
mov bl,[i]
mov [bx+si],byte 1 ; some other assemblers want byte ptr
There are other problems with your code too. The entire loop can be made easier and fixed this way:
lea si,a
xor cx,cx
mov cl,[i]
#fill_loop:
mov [si], byte 1
inc si
dec cx
jnz #fill_loop
Or, if you want to save 1 byte and use loop instruction.
#fill_loop:
mov [si], byte 1
inc si
loop #fill_loop
Note that in 16-bit mode loop instruction decrements cx and jumps to label if cx is not zero after decrement. However, in 32-bit mode loop decrements ecx and in 64-bit mode (x86-64) it decrements rcx.
I suppose that your code does not even run through the assembler, since
mov ds:si[i],1
is not a valid address mode.
Use something like
mov byte ptr [si],1 ; store value 1 at [SI]
inc si ; point to next array element
instead (used MASM to verify the syntax).
The DS: prefix is unnecessary for [si] since this is the default.
See also The 80x86 Addressing Modes.

Resources