Reading text file line by line NASM - file

I'm new in assembly programming and I have an assignment in which I have to read a text file line by line and use what is written in the file and pass it to another function. My problem is that I'm not sure of how to read the text this way because from what I have discovered for reading a text file first I have to create a buffer reserving certain quantity of bytes for storing what is in the file. Howerver in this case I want to read line by line (like a loop) until the end of file so I dont know how much bytes I have to reserve. Thanks.
Btw here is the code I'm trying to use:
SECTION .data
file_name db 'instruct.txt',0
SECTION .bss
fd_out resb 1
fd_in resb 1
info resb 20
SECTION .text
global main
main: ;tell linker entry point
push ebp
mov ebp, esp
push ebx
;open the file for reading
mov eax, 5
mov ebx, file_name
mov ecx, 2
mov edx, 0777 ;read, write and execute by all
int 0x80
mov [fd_in], eax
loop:
;read from file
mov eax, 3
mov ebx, [fd_in]
mov ecx, info
mov edx, 5
int 0x80
cmp eax, 0
;check EOF
je exit
; print the info
mov eax, 4
mov ebx, 1
mov ecx, info
mov edx, 5
int 0x80
;
jmp loop
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
exit:
; close the file
mov eax, 6
mov ebx, [fd_in]
pop ebx
mov esp, ebp
pop ebp
ret

use mmap syscall, read entire file to memory and search for 0x0A sequence. This is the ASCII code for end of line. Perhaps usefull too to check for 0x0D (in case you are dealing with windows text files. There the sequence 0x0A,0x0D indicates an new line and thus an end of line.
mmap will try to allocate memory for you without the overhead of administration for you. Otherwise determine the file length and reserve memory with syscall sbrk. Works also but you have to program a bit more. My suggestion is that mmap is the best way.

Related

Looping backwards in Assembly x86

When you are looping backwards in Assembly x86, what is currently happening in the memory (Can you try to be visual, thanks)? The following code is what I am currently wondering about:
INCLUDE Irvine32.inc
.data
arrayb byte 1,2,3,4,5,6 ;6-7 bytes
len dword lengthof arrayb
space byte " ",0
x dword 3
.code
main PROC
mov edx,offset space
mov eax,0 ; clear ecx of garbage
mov ecx, len
mov esi,offset arrayb ; start of the array's memory
add esi,len ;This causes the array value to start at 6
dec esi ; esi goes from esi+5,esi+4,...,esi
myloop2:
mov al,[esi]
call writedec
call writestring
dec esi
loop myloop2
call crlf
In particular, why did I have to add 1 to esi? When you add 1 to the high speed memory transfer register esi, it seems that it causes the array value to start at 6. Why is that?Thank you.

NASM Stack Error? Simple Multiplication Program

I'm learning NASM at the moment and am making a simple program that does multiplication of any user-input variables through shifting and addition.
I've been running into a series of issues: My multiplicand is, for some reason, being given at the maximum data value a word can hold. Furthermore, my answer, if the program should get that far, is almost always wrong (even though I believe my algorithm is correct!).
extern printf
extern scanf
section .data
message: db "Enter your multiplicand: "
message_L: equ $-message
message2: db "The number you entered is: %d ", 10, 0
message2_L: equ $-message2
message3: db "Enter your multiplier: "
message3_L: equ $-message3
message4: db "Your multiplier is: %d ", 10, 0
message4_L: equ $-message4
message5: db "The product of this multiplication is: %d ", 10, 0
mesasge5_L: equ $-message5
fmt1: db "%d", 0
section .bss
multiplicand: resw 1
multiplier: resw 1
product: resw 1
section .text
global main
scanInt:
push ebp
mov ebp, esp
sub esp, 2
lea eax, [ebp-2]
push eax
push dword fmt1
call scanf
mov ax, word[ebp-2]
mov esp, ebp
pop ebp
ret
main:
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
mov eax, 4
mov ebx, 1
mov ecx, message
mov edx, message_L
int 80h
call scanInt
mov word[multiplicand], ax
mov word[product], ax
jmp print1
main2:
mov eax, 4
mov ebx, 1
mov ecx, message3
mov edx, message3_L
int 80h
call scanInt
mov word[multiplier], ax
jmp print2
main3:
mov ax, word[multiplicand]
jmp check
check:
cmp word[multiplier], 2
jz printAnswer
ror [multiplier], 1
shl word[multiplier], 1
jc carry
shr word[multiplier], 1
shr word[multiplier], 1
shl word[product], 1
jmp check
carry:
add word[product], ax
shr word[multiplier], 1
clc
jmp check
endLoop:
mov eax, 1
mov ebx, 0
int 80h
printAnswer:
push ebp
mov ebp, esp
push word[product]
push dword message5
call printf
add esp, 12
mov esp, ebp
pop ebp
jmp endLoop
print1:
push ebp
mov ebp, esp
push dword[multiplicand]
push dword message2
call printf
add esp, 12
mov esp, ebp
pop ebp
jmp main2
print2:
push ebp
mov ebp, esp
push dword[multiplier]
push dword message4
call printf
add esp, 12
mov esp, ebp
pop ebp
jmp main3
I think your main problem comes from using word variables. Making a two-byte buffer on the stack, and calling scanf to read into it is almost certainly a problem. Pushing a word in 32-bit code is "legal", but likely to cause problems. In one instance, you call printf with two variables, and add esp, 12 afterwards. Make 'em all dwords and keep your stack manipulation in four-byte chunks. I think that'll cure most of your problems.
The man pages explicitly suggest not mixing high-level, buffered I/O, functions with low level functions (printf, scanf, fopen(), fread(), fwrite(), etc. are high level functions, open(), read(), write()... and system calls are low level functions). I don't think that this is causing any of your problems, but it can cause weird results. For example, printf doesn't print anything until the buffer is flushed. Ending with a linefeed, or using another high level I/O function will flush the buffer. sys_read, for example, does not. I'd stick to one or the other.
Good luck!

How to write in nasm without writing null bytes?

Pretty simple problem.
This nasm is supposed to write a user-written message (i.e. hello) to a file, again determined by user input from an argument. It does this just fine, but the problem is, it writes all the null bytes not used afterwards as well. For example, if I reserve 32 bytes for user input, and the user only uses four for his input, those for bytes will be printed, along with 28 null bytes.
How do I stop printing null bytes?
Code used:
global _start
section .text
_start:
mov rax, 0 ; get input to write to file
mov rdi, 0
mov rsi, msg
mov rdx, 32
syscall
mov rax, 2 ; open the file at the third part of the stack
pop rdi
pop rdi
pop rdi
mov rsi, 1
syscall
mov rdi, rax
mov rax, 1 ; write message to file
mov rsi, msg
mov rdx, 32
syscall
mov rax, 3 ; close file
syscall
mov rax, 1 ; print success message
mov rdi, 1
mov rsi, output
mov rdx, outputL
syscall
mov rax, 60 ; exit
mov rdi, 0
syscall
section .bss
msg: resb 32
section .data
output: db 'Success!', 0x0A
outputL: equ $-output
Well, after doing some digging in header files and experimenting, I figured it out on my own.
Basically, the way it works is that you have to put the user's string through a byte counting process that counts along the string until it finds a null byte, and then stores that number of non-null bytes.
I'll post the workaround I'm using for anyone who's had the same problem as me. Keep in mind that this solution is for 64-bit nasm, NOT 32!
For 32-bit coders, change:
all instances of "rax" with "eax"
all instances of "rdi" with "ebx"
all instances of "rsi" with "ecx"
all instances of "rdx" with "edx"
all instances of "syscall" with "int 80h" (or equivelant)
all instances of "r8" with "edx" (you'll have to juggle this and rdx)
Here's the solution I use, in full:
global _start
; stack: (argc) ./a.out input filename
section .text
_start:
getInput:
mov rax, 0 ; syscall for reading user input
mov rdi, 0
mov rsi, msg ; store user input in the "msg" variable
mov rdx, 32 ; max input size = 32 bytes
xor r8, r8 ; set r8 to zero for counting purposes (this is for later)
getInputLength:
cmp byte [msg + r8], 0 ; compare ((a byte of user input) + 0) to 0
jz open ; if the difference is zero, we've found the end of the string
; so we move on. The length of the string is stored in r9.
inc r8 ; if not, onto the next byte...
jmp getInputLength ; so we jump back up four lines and repeat!
open:
mov rax, 2 ; syscall for opening files
pop rdi
pop rdi
pop rdi ; get the file to open from the stack (third argument)
mov rsi, 1 ; open in write mode
syscall
; the open syscall above has made us a full file descriptor in rax
mov rdi, rax ; so we move it into rdi for later
write:
mov rax, 1 ; syscall for writing to files
; rdi already holds our file descriptor
mov rsi, msg ; set the message we're writing to the msg variable
mov rdx, r8 ; set write length to the string length we measured earlier
syscall
close:
mov rax, 3 ; syscall for closing files
; our file descriptor is still in fd
syscall
exit:
mov rax, 60 ; syscall number for program exit
mov rdi, 0 ; return 0
Keep in mind that this is not a complete program. It totally lacks error handling, offers no user instruction, etc. It is only an illustration of method.

assembly: stuck - reading in text file

Below is what I hope is the relevant code from my NASM program. Once int 080h is called, the debugger is showing -9 for eax. The text in my test.txt is 321314145. I've been staring at this for hours and I've hit a dead end here. Why is this happening?
%define BUFLEN 128
%define READLEN 3
%define SYSCALL_READ 3
SECTION .bss ; uninitialized data section
buf: resb READLEN ; buffer for read
rlen: resb 4
newstr: resb BUFLEN
; read file name from arg
;
pop ebx ;not using
pop ebx ;not using
pop ebx ;pop filename
; open file
;
mov eax, SYSCALL_OPEN
mov ecx, STDIN
int 080h
mov eax, SYSCALL_READ ; read function
mov ebx, eax ; Arg: file descriptor
mov ecx, buf ; Arg: address of buffer
mov edx, READLEN ; Arg: buffer length
int 080h
mov eax, SYSCALL_READ ; read function
mov ebx, eax ; Arg: file descriptor
You're kidding, right?
the -9 is "bad file number" (per errno.h). Make sure that your sys_open succeeds, then put eax in ebx.... BEFORE altering eax!

x86 NASM Assembly - Problems with Input

I am working to take input from a user twice, and compare the input. If they are the same, the program exits. If not, it reprints the input from the first time, and waits for the user to type something. If it is the same, the same thing as before occurs. If not, the same thing as before occurs.
Input and looping is not the problem. The main problem is the result I am getting from the program. My following is what I am doing codewise:
%include "system.inc"
section .data
greet: db 'Hello!', 0Ah, 'Please enter a word or character:', 0Ah
greetL: equ $-greet ;length of string
inform: db 'I will now repeat this until you type it back to me.', 0Ah
informL: equ $-inform
finish: db 'Good bye!', 0Ah
finishL: equ $-finish
newline: db 0Ah
newlineL: equ $-newline
section .bss
input: resb 40 ;first input buffer
check: resb 40 ;second input buffer
section .text
global _start
_start:
greeting:
mov eax, 4
mov ebx, 1
mov ecx, greet
mov edx, greetL %include "system.inc"
section .data
greet: db 'Hello!', 0Ah, 'Please enter a word or character:', 0Ah
greetL: equ $-greet ;length of string
inform: db 'I will now repeat this until you type it back to me.', 0Ah
informL: equ $-inform
finish: db 'Good bye!', 0Ah
finishL: equ $-finish
newline: db 0Ah
newlineL: db $-newline
section .bss
input: resb 40 ;first input buffer
check: resb 40 ;second input buffer
section .text
global _start
_start:
greeting:
mov eax, 4
mov ebx, 1
mov ecx, greet
mov edx, greetL
sys.write
getword:
mov eax, 3
mov ebx, 0
mov ecx, input
mov edx, 40
sys.read
sub eax, 1 ;remove the newline
push eax ;store length for later
instruct:
mov eax, 4
mov ebx, 1
mov ecx, inform
mov edx, informL
sys.write
pop edx ;pop length into edx
mov ecx, edx ;copy into ecx
push ecx ;store ecx again (needed multiple times)
mov eax, 4
mov ebx, 1
mov ecx, input
sys.write
mov eax, 4 ;print newline
mov ebx, 1
mov ecx, newline
mov edx, newlineL
sys.write
mov eax, 3 ;get the user's word
mov ebx, 0
mov ecx, check
mov edx, 40
sys.read
xor eax, eax
checker:
mov ebx, check
mov ecx, input
cmp ebx, ecx ;see if input was the same as before
jne loop ;if not the same go to input again
je done ;else go to the end
pop edx
mov ecx, edx
push ecx
mov eax, 4
mov ebx, 1
mov ecx, check
sys.write ;repeat the word
mov eax, 4
mov ebx, 1
mov ecx, newline
mov edx, newlineL
sys.write
loop:
mov eax, 3 ;replace new input with old
mov ebx, 0
mov ecx, check
mov edx, 40
sys.read
jmp checker
done:
mov eax, 1
mov ebx, 0
sys.exit
sys.write
getword:
mov eax, 3
mov ebx, 0
mov ecx, input
mov edx, 40
sys.read
My result is now: EDITED
Hello!
Please enter a word or character:
Nick
I will now repeat this until you type it back to me.
Nick
(I input) Magerko
(I get) M
(I input)Nick
(I get)
(I input)Nick
(I get)
EDITED
And this continues. My checks do not work as intended in the code above, and I eventually don't even get the program to print anything but a newline. Is there a reason for this?
Thanks.
Apart from what #Joshua is pointing out, you're not comparing your strings correctly.
checker:
mov ebx, check ; Moves the *address* of check into ebx
mov ecx, input ; Similarly for input
cmp ebx, ecx ; Checks if the addresses are the same (they never are)
Firstly, when you have e.g. label dd 1234 in your data segment mov eax, label will move the address of label to eax while mov eax, [label] will move the contents stored at label (in this case 1234) into eax.
Note that in the above example I deliberately used a 32-bit variable so that it would fit neatly into eax. If you're using byte sized variables (like ascii characters) e.g. mybyte db 0xfe you'll either have to use byte sized register (al, ah, dh etc.) or use the move with zero/sign extend opcodes: movzx eax, byte [mybyte] will set eax to 254, while movsx eax, byte [mybyte] will set eax to -2 (0xfffffffe).
You also need to do a character by character comparison of the strings. Assuming you save the read string length (you really should be checking for negative return values - meaning errors) in input_len and check_len it could look something like:
mov eax, [input_len]
cmp eax, [check_len]
jne loop ; Strings of different length, do loop again
mov ebx, check
mov ecx, input
.checkloop:
mov dl, [ebx] ; Read a character from check
cmp dl, [ecx] ; Equal to the character from input?
jne loop ; Nope, jump to `loop`
inc ebx ; Move ebx to point at next character in check
inc ecx ; and ecx to next character in input
dec eax ; one less character to check
jnz .checkloop ; done?
; the strings are equal if we reach this point in the code
jmp done
If you're interested in another way of doing this in fewer instructions look up rep cmpsb.
There are a few other problems in the code immediately following your checker code. The pop edx instruction (and the code following, down to the loop label) will not be execute as you're always jumping either to loop or done.
jne loop ;if not the same go to input again
je done ;else go to the end
pop edx ; Will never be reached!
The reason you're getting funny characters is from newlineL: db $-newline This should be equ instead of db or you should replace mov edx, newlineL with movzx edx, byte [newlineL]. Since newlineL unlike the other *L names refers to a variable and not a constant equ mov edx, newlineL will use the address of the newlineL variable as the number of bytes to write, when you wanted it to be 1.
You are assuming sys.read returns the entire line. It is not required to do so. It may return after only one character, or even possibly after part of the second line.
You know, this kind of thing kind of ticks me off. This looks like a homework problem in writing in assembly, but the problem is not with the assembly, but with the assumptions in how the system calls work.
I really wish the instructors would provide an fgets library function for stuff like this.
Anyway, the stupid way to fix it is to read one byte at a time, looking for LF (byte 10) to end the loop.

Resources