I've been tasked to create a very basic text editor as a project. I'm currently trying to take an argument on the command line, open the file, read it, etc. The issue I'm having is either I'm not understanding where in memory the contents of the text file are being put, or I do understand but for some reason it's not being put there.
This is my code so far
org 100h
jmp start
inHandle dw ?
charAmount dw ?
buff db 100 dup (?)
start :
xor bx, bx
mov bl, [80h] ; length of string from command line
cmp bl, 126 ; check length
ja exit ; if above the length, exit
mov [bx+81h], 0 ; add 0 to the end of the string
;*** open file***
mov ah, 3Dh ; open existing file
mov al, 0 ; read only
mov dx, 82h ; offset of string
int 21h
mov inHandle, ax ; save the handle
jc err ; carry flag set, jump to error block
;**Note i wont include the err code block, it just displays an icon on a video window to tell me it went wrong;
jmp continue
;***I know this continue is probably redundant since it will go here on its own
continue: nop
mov ah, 42h ; Seek end of file
mov bx, inHandle ; Bx takes the handle
mov al, 2 ; end of file plus offset
mov cx, 0 ; Upper order of bytes to move
mov dx, 0 ; Lower order of bytes to move
int 21h
mov charAmount, ax ; store the length in charAmount (My file has 13 for example, so this returned 13 after seeking the end of file)
;*******READ FILE******
mov ah, 3Fh ; Read file
mov bx, inHandle ; Takes the handle
xor cx, cx
mov cx, charAmount ; Counter set to the length
mov dx, offset buff ; set to buffer I defined
int 21h
exit: nop ; (was used for the error code I didnt include)
end start
So I'm confused about the read file. I'm a bit unsure what passing something called buffer to dx does. Is it an offset to give? I'm reading in documentation it says DS:DX is the pointer to read buffer. After I run my code, DS is 0700, and DX is 0112. So I look in the memory at 0700:0112 but I don't see the string from my file. It's just all 0's.
Did I do something wrong? Am I forgetting something? Or am I not understanding at all where in memory this should be and I'm just looking at the wrong address. This is very frustrating and I'd appreciate the help. Thanks. I'm doing this in emu8086 by the way.
I have a following code written in TASM assembly for reading from a file and printing out the file content using a buffer.
Buffer declaration:
buffer db 100 dup (?), '$' ;regarding to comment, buffer is db 101 dup (?), '$'
The structure of my program is:
Task 1 is asking me for a file name (string) which I want to read.
After I input file name, the procedure task1 opens the file.
mov ah, 3dh
xor al, al
lea dx, fname
int 21h ;open file
jc openError
mov bx, ax
Not sure, if opening the file is correct, because I have seen similar ways of opening the file but I do not have a handler here, or?
Here is the reading part task2:
task2 proc
pam 10,13 ;pam is macro for printing out
mov ah, 3fh
lea dx, buffer
mov cx, 100
int 21h
jc readError ;read error , jump
mov si, ax
mov buffer[si], '$'
mov ah, 09h
int 21h ;print out
cmp si, 100
je read
jmp stop ;end
pam error1
jmp stop
pam error2
stop: ret
task2 endp
My question is, how can I get file length using this code? I have read that there are some ways of getting file size but they all look very complicated and I was thinking that when I read file, I should be able to calculate file size by storing number of characters I read in a register but I am not so sure about it and if it is possible, then I have no idea how to do that in tasm. Also in data segment, what variable do I need for storing file size? Maybe a code snippet would help me understand the process with some helpful comments how does it work. Thanks for help.
UPDATE regarding to the answer:
So I tried to convert hexa to decimal, it kinda works but I must have some bug in there because it works for small file, lets say I tried 1kB file and it worked, I got size in Bytes printed out on screen but when I tried bigger file like 128kB, decimal numbers were not correct - printed size was wrong, file is big exactly 130,862 bytes and my conversion gave me -- MENU653261 = Enter file name.
... code from the answer ...
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
;HEX TO DECIMAL = my code starts here
mov cx,0
mov bx,10
loop1: mov dx,0
div bx
add dl,30h
push dx
inc cx
cmp ax,9
jg loop1
add al,30h
mov [si],al
loop2: pop ax
inc si
mov [si],al
loop loop2
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
Here is a screen how it looks when the decimal value gets printed out. It is mixed with the next line. I tried to move it to the next line but did not help.
A simple code to display hexa-formatted length of DOS file (file name is hardcoded in source, edit it to existing file):
.model small
.stack 100h
fname DB "somefile.ext", 0
buffer DB 100 dup (?), '$'
; set up "ds" to point to data segment
mov ax,#data
mov ds,ax
; open file first, to get "file handle"
mov ax,3D00h ; ah = 3Dh (open file), al = 0 (read only mode)
lea dx,[fname] ; ds:dx = pointer to zero terminated file name string
int 21h ; call DOS service
jc fileError
; ax = file handle (16b number)
; now set the DOS internal "file pointer" to the end of opened file
mov bx,ax ; store "file handle" into bx
mov ax,4202h ; ah = 42h, al = 2 (END + cx:dx offset)
xor cx,cx ; cx = 0
xor dx,dx ; dx = 0 (cx:dx = +0 offset)
int 21h ; will set the file pointer to end of file, returns dx:ax
jc fileError ; something went wrong, just exit
; here dx:ax contains length of file (32b number)
; close the file, as we will not need it any more
mov cx,ax ; store lower word of length into cx for the moment
mov ah,3Eh ; ah = 3E (close file), bx is still file handle
int 21h ; close the file
; ignoring any error during closing, so not testing CF here
; BTW, int 21h modifies only the registers specified in documentation
; that's why keeping length in dx:cx registers is enough, avoiding memory/stack
; display dx:cx file length in hexa formatting to screen
; (note: yes, I used dx:cx for storage, not cx:dx as offset for 42h service)
; (note2: hexa formatting, because it's much easier to implement than decimal)
lea di,[buffer] ; hexa number will be written to buffer
mov word ptr [di],('0' + 'x'*256) ; with C-like "0x" prefix
add di,2 ; "0x" written at start of buffer
mov ax,dx
call AxTo04Hex ; upper word converted to hexa string
mov ax,cx
call AxTo04Hex ; lower word converted to hexa string
mov byte ptr [di],'$' ; string terminator
; output final string to screen
mov ah,9
lea dx,[buffer]
int 21h
; exit to DOS with exit code 0 (OK)
mov ax,4C00h
int 21h
mov ax,4C01h ; exit with code 1 (error happened)
int 21h
AxTo04Hex: ; subroutine to convert ax into four ASCII hexadecimal digits
; input: ax = 16b value to convert, ds:di = buffer to write characters into
; modifies: di += 4 (points beyond the converted four chars)
push cx ; save original cx to preserve it's value
mov cx,4
rol ax,4 ; rotate whole ax content by 4 bits "up" (ABCD -> BCDA)
push ax
and al,0Fh ; keep only lowest nibble (4 bits) value (0-15)
add al,'0' ; convert it to ASCII: '0' to '9' and 6 following chars
cmp al,'9' ; if result is '0' to '9', just store it, otherwise fix
jbe AxTo04Hex_notLetter
add al,'A'-(10+'0') ; fix value 10+'0' into 10+'A'-10 (10-15 => 'A' to 'F')
mov [di],al ; write ASCII hexa digit (0-F) to buffer
inc di
pop ax ; restore other bits of ax back for next loop
dec cx ; repeat for all four nibbles
jnz AxTo04Hex_singleDigitLoop
pop cx ; restore original cx value back
ret ; ax is actually back to it's input value here :)
end start
I tried to comment the code extensively, and to use "more straightforward" implementation of this stuff, avoiding some less common instructions, and keep the logic simple, so actually you should be able to comprehend how it works fully.
Again I strongly advise you to use debugger and go instruction by instruction slowly over it, watching how CPU state is changing, and how it correlates with my comments (note I'm trying to comment not what the instruction exactly does, as that can be found in instruction reference guide, but I'm trying to comment my human intention, why I wrote it there - in case of some mistake this gives you idea what should have been the correct output of the wrong code, and how to fix it. If comments just say what the instruction does, then you can't tell how it should be fixed).
Now if you would implement 32b_number_to_decimal_ascii formatting function, you can replace the last part of this example to get length in decimal, but that's too tricky for me to write from head, without proper debugging and testing.
Probably the simplest way which is reasonably to implement by somebody new to asm is to have table with 32b divisors for each 32b decimal digit and then do nested loop for each digits (probably skipping storage of leading zeroes, or just incrementing the pointer before printing to skip over them, that's even less complex logic of code).
Something like (pseudo code similar to C, hopefully showing the idea):
divisors dd 1000000000, 100000000, 10000000, ... 10, 1
for (i = 0; i < divisors.length; ++i) {
buffer[i] = '0';
while (divisors[i] <= number) {
number -= divisors[i];
digit[i] = '$';
// then printing as
ptr_to_print = buffer;
// eat leading zeroes
while ( '0' == ptr_to_print[0] ) ++ptr_to_print;
// but keep at least one zero, if the number itself was zero
if ('$' == ptr_to_print[0] ) --ptr_to_print;
print_it // dx = ptr_to_print, ah = 9, int 21h
And if you wonder, how do you subtract 32 bit numbers in 16 bit assembly, that's actually not that difficult (as 32b division):
; dx:ax = 32b number
; ds:si = pointer to memory to other 32b number (mov si,offset divisors)
sub ax,[si] ; subtract lower word, CF works as "borrow" flag
sbb dx,[si+2] ; subtract high word, using the "borrow" of SUB
; optionally: jc overflow
; you can do that "while (divisors[i] <= number)" above
; by subtracting first, and when overflow -> exit while plus
; add the divisor back (add + adc) (to restore "number")
Points to question update:
You don't convert hex to decimal (hex string is stored in buffer, you don't load anything from there). You convert value in ax to decimal. The ax contains low word of file length from previous hex conversion call. So for files of length up to 65535 (0xFFFF = maximum 16b unsigned integer) it may work. For longer files it will not, as upper word is in dx, which you just destroy by mov dx,0.
If you would actually keep dx as is, you would divide file length by 10, but for file with 655360+ length it would crash on divide error (overflow of quotient). As I wrote in my answer above, doing 32b / 16b division on 8086 is not trivial, and I'm not even sure what is the efficient way. I gave you hint about using table of 32b divisors, and doing the division by subtraction, but you went for DIV instead. That would need some sophisticated split of the original 32b value into smaller parts up to a point where you can use div bx=10 to extract particular digits. Like doing filelength/1e5 first, then calculate 32b remainder (0..99999) value, which can be actually divided by 10 even in 16b (99999/10 = 9999 (fits 16b), remainder 9).
Looks like you didn't understand why 128k file length needs 32 bits to store, and what are the effective ranges of various types of variables. 216 = 65536 (= 64ki) ... that how big your integers can get, before you run into problems. 128ki is two times over that => 16 bit is problem.
Funny thing... as you wrote "converting from hex to decimal", at first I though: what, you convert that hexa string into decimal string??? But actually that sounds doable with 16b math, to go through whole hexa number first picking up only 100 values (extracted from particular k*16n value), then in next iteration doing 101 counting, etc...
But that division by subtracting 32bit numbers from my previous answer should be much easier to do, and especially to comprehend, how it works.
You write the decimal string at address si, but I don't see how you set si, so it's probably pointing into your MENU string by accident, and you overwrite that memory (again using debugger, checking ds:si values to see what address is used, and using memory view to watch the memory content written would give you hint, what is the problem).
Basically you wasted many hours by not following my advices (learning debugging and understanding what I meant by 32b - 32b loop doing division), trying to copy some finished code from Internet. At least it looks like you can somewhat better connect it to your own code, but you are still missing obvious problems, like not setting si to point to destination for decimal string.
Maybe try to first to print all numbers from the file, and keep the size in hexa (at least try to figure out, why conversion to hexa is easy, and to decimal not). So you will have most of the task done, then you can play with the hardest part (32b to decimal in 16b asm).
BTW, just a day ago or so somebody had problem with doing addition/subtraction over 64b numbers in 16b assembly, so this answer may give you further hints, why doing those conversion by sub/add loops is not that bad idea, it's quite "simple" code if you get the idea how it works:
I've a problem in assembly language that I want to make loop for sum element of an array. Suppose an array contains 10,20,30,40,50,60,70,80,90,100 I have to sum all elements of the array by loop... How can I do this?
I'm trying this:
W DW 10,20,30,40,50,60,70,80,90,100
MOV AX, #data
MOV CX, 10
;this for display
but something wrong in display that print from ascii (&).
EDIT: Updated answer since the code in the question has been changed:
INT 21h / AH=2 prints a single character (note that the integer 1 and the character '1' are different values).
The sum of the elements in your array is 550, which requires 3 characters to print. The way to solve that is to write a routine that converts the value 550 to the string "550", and then use INT 21h / AH=9 to print that string. How you'd go about doing that has been asked several times before on StackOverflow; see e.g. this question and the answers to it.
This is my answer for the original question
For future questions, note that "but something wrong" is a terrible problem description. You should explain precisely in what way the code isn't behaving the way you intended.
That said, there are a number of problems with your code:
Here you're initializing CX to the first value in x. Actually, since the elements in x are bytes (because you used DB) and CX is a word (two bytes) you'll get CX = 301h (which is 769 in decimal):
Here you're simply moving the first element of x into BX over and over, instead of doing an addition. And again, x contains bytes while BX is a word register.
top: MOV BX, [x]
The loop instruction decrements CX by 1 and jumps to the given label if CX != 0. By incrementing CX before the loop you're creating an infinite loop. Also, the CMP is useless (and I'm not sure why you're comparing against 7 since x only has 5 elements):
loop top
This will only work for values in the range 0-9. If the sum is >=10 it will require multiple characters. See e.g. this answer for an example of how to convert a multi-digit number to a string that can be printed. Also, you're writing a word-sized register to a byte variable:
ADD BX, '0'
MOV [sum], BX
Here I'm a bit lost at what you're trying to do. If you wanted to write a single character to STDOUT you should use INT 21h / AH = 2 / DL = character. Note that MOV AX,4 sets AH=0 and AL=4. Also, you should end your program with INT 21h / AX = 4C00h:
MOV CX, sum
INT 21h
INT 21h
I suspect that there is an error in the code following the top label.
You do MOV BX, [x] but I think there you should sum the item pointed by CX with what currently is in BX (that seems to store the sum). So substitute the move instruction with:
I have an assignment from school. I have to read any file in size to 128KB and write its content on screen.
I use function 3Dh for opening specific file and then function 3Fh to read a file. I use 32KB buffer for it.
I face few problems now.
Have 59KB .txt file with some text from book and also some of my codes.
When I want to get size of file in Bytes, it runs fine and result is correct.
When I want to print content of file It, prints everything to the point where occurs '$' character in file. So I need somehow escape all special characters as '$' is to print whole and any file.
Have 380KB .csv file
When I print it, it is printed fine, whole file, all 380KB.
But, when I want to get size, it returns just 2186 B. When I don't close file at the end of procedure and call this procedure again and again, it returns always size in bytes as multiple of 2186 B (4372, 6558, etc.).
I copied 126KB from previous .csv to another
Again print is ok (there are no '$' chars).
When I get size it returns 64063 B so again wrong result.
Here are my procedures.
buffsiz equ 32768 ;buffer size =32KB
fnsize equ 255 ;filename size =255
data segment
maxlen db fnsize ;max length of file name
len db ? ;length of filename
file db fnsize dup (?) ;file name
filesiz dd ? ;dword variable of file size
buffer db buffsiz dup ('$') ;32KB buffer
data ends
getcont proc ;get content of file procedure
mov ah,3dh ;open file function
mov al,0 ;read-access bit
call forout ;just bring 0 char on the end of filename
mov dx,offset file ;"move filename" to dx
int 21h
mov bx,ax ;move filehandler from ax to bx
buffIn: prntstr buffer ;print content of buffer (in first iteration it is whole set to '$'
mov ah,3fh ;read from file
mov cx,buffsiz ;how much bytes it should read from file (32768)
mov dx,offset buffer
int 21h
output: xchg ax,bx ;exchange values in ax and bx
mov buffer[bx],'$' ;after last read byte put '$' into buffer
xchg ax,bx ;exchange registers back for next iteration
cmp ax,0 ;if there was no read byte stop loop
jnz buffIn ;if was go to next iteration
mov ah,3Eh ;close file
int 21h
getcont endp
getsize proc
mov word ptr[filesiz],0 ;put zero into filesize variable (dword)
mov word ptr[filesiz]+2,0
mov ah,3dh ;same as in getcont procedure
mov al,0
call forout
mov dx,offset file
int 21h
mov bx,ax
bufflp: mov ah,3fh
mov cx,buffsiz
mov dx,offset buffer
int 21h
add word ptr[filesiz],ax ;add number of bytes read into filesiz variable - not certain in this
cmp ax,0 ;if there was no byte read end loop
jnz bufflp ;if was go to next iteration
prntstr nl ;new line
prntstr velkost ;print string about file size operation
xor dx,dx ;clear ax and dx registers
xor ax,ax
mov ax,word ptr[filesiz] ;move low word from filesiz(dword) variable to ax
mov dx,word ptr[filesiz]+2 ;move high word from filesiz to dx to get filesiz=dx:ax
call prntint ;call procedure to print decimal number on output
prntchr ' ' ;print space
prntchr 'B' ; print Byte unit char
mov ah,3Eh ;close file
int 21h
getsize endp
Working with TASM assembly x86.
I found these problems in the code you presented:
mov buffer[bx],'$' ;after last read byte put '$' into buffer
You should enlarge the buffer by 1 byte. Now you are writing this $ past the buffer when 32768 bytes were read!
add word ptr[filesiz],ax ;add number of bytes read into filesiz variable
The previous line will not update the dword variable filesiz! Use the following
add word ptr[filesiz],ax
adc word ptr[filesiz]+2,0
ps. You don't ever check if DOS reports an error. You should not neglect this when accessing files!
I did a search on Stack Overflow, and I have not found anything similar to my problem. My problem is this: I have a code that opens a file and writes a message at the end. When I use int 21h to write to the file in the first time, it writes well if the file is empty, but if the file has content, the program adds to the end many trash bytes (characters like 畂 or another japanese or chinese characters).
I have checked that the program don't write more bytes than the message length. Please, help me. Here is my source code:
.model tiny
call delta
pop bp
sub bp, offset delta
mov ax, #code ;Get the address of code segment and store it in ax
mov ds, ax ;Put that value in Data segment pointer.
;Now, we can reference any data stored in the code segment
;without fail.
mov ax, 3D02H ;Opens a file
lea dx, [bp+filename];Filename
int 21h ;Call DOS interrupt
mov handle, ax ;Save the handle in variable
mov bx, handle
mov ax,4202h ; Move file pointer
xor cx,cx ; to end of file
cwd ; xor dx,dx
int 21h
mov ax, 4000H
mov bx, handle
lea dx, [bp+sign]
mov cx, 16
int 21H
mov ah,4Ch ;Terminate process
mov al,0 ;Return code
int 21h
handle dw ?
filename db 'C:\A.txt', 0
sign db 'Bush was here!!', 0
end main
Please help me!!
That's because the file to which you're appending the data is encoded in unicode. If you write a file out from Notepad or another text editor and save it, you have to pick ANSI as the encoding. Then if you point your program at the ANSI encoded text file, it should append the string indicated with the expected result.
Unicode allocates two bytes for every character so in a hex editor you might see s.o.m.e.t.h.i.n.g. .l.i.k.e. .t.h.i.s. rather than something like this that you might expect for ANSI or UTF-8.