x86 assembly compare with null terminated array - arrays

I'm working on a function in assembly where I need to count the characters in a null terminated array. I'm using visual studio. The array was made in C++ and the memory address is passed to my assembly function. Problem is my loop isn't ending once I reach null (00). I have tried using test and cmp but it seems as though 4 bytes are being compared instead of 1 byte (size of the char).
My code:
_arraySize PROC ;name of function
start: ;ebx holds address of the array
push ebp ;Save caller's frame pointer
mov ebp, esp ;establish this frame pointer
xor eax, eax ;eax = 0, array counter
xor ecx, ecx ;ecx = 0, offset counter
arrCount: ;Start of array counter loop
;test [ebx+eax], [ebx+eax] ;array address + counter(char = 1 byte)
mov ecx, [ebx + eax] ;move element into ecx to be compared
test ecx, ecx ; will be zero when ecx = 0 (null)
jz countDone
inc eax ;Array Counter and offset counter
jmp arrCount
countDone:
pop ebp
ret
_arraySize ENDP
How can I compare just 1 byte? I just thought of shifting the bytes I don't need but that seems like a waste of an instruction.

If you want to compare a single byte, use a single byte instruction:
mov cl, [ebx + eax] ;move element to be compared
test cl, cl ; will be zero when NUL
(Note that a zero character is ASCII NUL, not an ANSI NULL value.)

Related

ASSEMBLY - output an array with 32 bit register vs 16 bit

I'm was working on some homework to print out an array as it's sorting some integers from an array. I have the code working fine, but decided to try using EAX instead of AL in my code and ran into errors. I can't figure out why that is. Is it possible to use EAX here at all?
; This program sorts an array of signed integers, using
; the Bubble sort algorithm. It invokes a procedure to
; print the elements of the array before, the bubble sort,
; once during each iteration of the loop, and once at the end.
INCLUDE Irvine32.inc
.data
myArray BYTE 5, 1, 4, 2, 8
;myArray DWORD 5, 1, 4, 2, 8
currentArray BYTE 'This is the value of array: ' ,0
startArray BYTE 'Starting array. ' ,0
finalArray BYTE 'Final array. ' ,0
space BYTE ' ',0 ; BYTE
.code
main PROC
MOV EAX,0 ; clearing registers, moving 0 into each, and initialize
MOV EBX,0 ; clearing registers, moving 0 into each, and initialize
MOV ECX,0 ; clearing registers, moving 0 into each, and initialize
MOV EDX,0 ; clearing registers, moving 0 into each, and initialize
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET startArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
MOV ECX, lengthOf myArray ; load ECX with # of elements of array
DEC ECX ; decrement count by 1
L1:
PUSH ECX ; save outer loop count
MOV ESI, OFFSET myArray ; point to first value
L2:
MOV AL,[ESI] ; get array value
CMP [ESI+1], AL ; compare a pair of values
JGE L3 ; if [esi] <= [edi], don't exch
XCHG AL, [ESI+1] ; exchange the pair
MOV [ESI], AL
CALL printArray ; call printArray function
CALL crlf
L3:
INC ESI ; increment esi to the next value
LOOP L2 ; inner loop
POP ECX ; retrieve outer loop count
LOOP L1 ; else repeat outer loop
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET finalArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
CALL printArray
L4 : ret
exit
main ENDP
printArray PROC uses ESI ECX
;myArray loop
MOV ESI, OFFSET myArray ; address of myArray
MOV ECX, LENGTHOF myArray ; loop counter (5 values within array)
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET currentArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
L5 :
MOV AL, [ESI] ; add an integer into eax from array
CALL writeInt
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET space
CALL writeString
POP EDX ; restores the original edx register value
ADD ESI, TYPE myArray ; point to next integer
LOOP L5 ; repeat until ECX = 0
CALL crlf
RET
printArray ENDP
END main
END printArray
; output:
;Starting array. This is the value of array: +1 +5 +4 +2 +8
;This is the value of array: +1 +4 +5 +2 +8
;This is the value of array: +1 +4 +2 +5 +8
;This is the value of array: +1 +2 +4 +5 +8
;Final array. This is the value of array: +1 +2 +4 +5 +8
As you can see the output sorts the array just fine from least to greatest. I was trying to see if I could move AL into EAX, but that gave me a bunch of errors. Is there a work around for this so I can use a 32 bit register and get the same output?
Using EAX is definitely possible, in fact you already are. You asked "I was trying to see if I could move AL into EAX, but that gave me a bunch of errors." Think about what that means. EAX is the extended AX register, and AL is the lower partition of AX. Take a look at this diagram:image of EAX register
. As you can see, moving AL into EAX using perhaps the MOVZX instruction would simply put the value in AL into EAX and fill zeroes in from right to left. You'd be moving AL into AL, and setting the rest of EAX to 0. You could actually move everything into EAX and run the program just the same and there'd be no difference because it's using the same part of memory.
Also, why are you pushing and popping EAX so much? The only reason to push/pop things from the runtime stack is to recover them later, but you never do that, so you can just let whatever is in EAX at the time just die.
If you still want to do an 8-bit store, you need to use an 8-bit register. (AL is an 8-bit register. IDK why you mention 16 in the title).
x86 has widening loads (movzx and movsx), but integer stores from a register operand always take a register the same width as the memory operand. i.e. the way to store the low byte of EAX is with mov [esi], al.
In printArray, you should use movzx eax, byte ptr [esi] to zero-extend into EAX. (Or movsx to sign-extend, if you want to treat your numbers as int8_t instead of uint8_t.) This avoids needing the upper 24 bits of EAX to be zeroed.
BTW, your code has a lot of unnecessary instructions. e.g.
MOV EAX,0 ; clearing registers, moving 0 into each, and initialize
totally pointless. You don't need to "init" or "declare" a register before using it for the first time, if your first usage is write-only. What you do with EDX is amusing:
MOV EDX,0 ; clearing registers, moving 0 into each, and initialize
PUSH EDX ; preserves the original edx register value for future writeString call
MOV EDX, OFFSET startArray ; load EDX with address of variable
CALL writeString ; print string
POP EDX ; return edx to previous stack
"Caller-saved" registers only have to be saved if you actually want the old value. I prefer the terms "call-preserved" and "call-clobbered". If writeString destroys its input register, then EDX holds an unknown value after the function returns, but that's fine. You didn't need the value anyway. (Actually I think Irvine32 functions at most destroy EAX.)
In this case, the previous instruction only zeroed the register (inefficiently). That whole block could be:
MOV EDX, OFFSET startArray ; load EDX with address of variable
CALL writeString ; print string
xor edx,edx ; edx = 0
Actually you should omit the xor-zeroing too, because you don't need it to be zeroed. You're not using it as counter in a loop or anything, all the other uses are write-only.
Also note that XCHG with memory has an implicit lock prefix, so it does the read-modify-write atomically (making it much slower than separate mov instructions to load and store).
You could load a pair of bytes using movzx eax, word ptr [esi] and use a branch to decide whether to rol ax, 8 to swap them or not. But store-forwarding stalls from byte stores forwarding to word loads isn't great either.
Anyway, this is getting way off topic from the title question, and this isn't codereview.SE.

Assembly Language x86 Irvine32

I'm fairly new to Assembly Language and I'm trying to figure out this program. Just want to know if I'm on point with the program. How do I correct this program?
Write a loop that computes the sum of all the elements in the array of bytes. Print the result. Some hints: Load the size of the array into an appropriate register. Load the offset of the current element of the array and change it accordingly on every iteration of the loop.
Here's what I have so far:
INCLUDE Irvine32.inc
.data
val1 BYTE 1,2,3
counter = 0
.code
main PROC
mov ax, 0
mov ax, (LENGTHOF val1)
mov ax, OFFSET Counter
movzx ecx,ax
L1:
add eax, val1[ecx]
inc eax
loop L1
Call WriteDec
exit
END PROC
end main
You have several errors in your code: In the following sequence you repeatedly set ax which is pretty useless:
mov ax, 0 ; you set ax to 0
mov ax, (LENGTHOF val1) ; you set ax to 3
mov ax, OFFSET Counter ; you set ax to an address (and try to use a 16-bit register for a 32-bit address
Then you add this offset to another offset in
movzx ecx,ax
L1:
add eax, val1[ecx] ; add offset(val1)[offset(Counter)] to offset(Counter)
With certainty, this will give you a memory error, because the address may be anywhere. Then you increase this offset with
inc eax ; you probably confused this with a counter/index register
And after that you use this offset in ECX, which you put in there by movzx ecx, ax as an index in ECX in the LOOP instruction
loop L1 ; decrements ECX and loops if ECX != 0
After fixing all of these errors, the code could look like this:
INCLUDE Irvine32.inc
.data
val1 BYTE 1,2,3
counter = 0
.code
main:
xor eax, eax ; 32-bit register for the sum
xor edx, edx ; 32-bit register for the counter/index
mov ecx, LENGTHOF val1 ; number of entries in fixed size array in ECX as maximum limit
L1:
movsx ebx, byte ptr val1[edx] ; extend BYTE at offset val1 + EDX to 32-bit
add eax, ebx ; add extended BYTE value to accumulator
inc edx ; increase index in EDX
loop L1 ; decreases ECX and jumps if not zero
Call WriteDec ; I assume this prints EAX
exit
end main

Calling C function in Assembly Segfaults

I am trying to write an assembly program that calls a function in c that will replace certain characters in a string with a predefined character given that the currently character in the char array meets some qualification.
My c file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
//display *((char *) $edi)
// These functions will be implemented in assembly:
//
int strrepl(char *str, int c, int (* isinsubset) (int c) ) ;
int isvowel (int c) {
if (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u')
return 1 ;
if (c == 'A' || c == 'E' || c == 'I' || c == 'O' || c == 'U')
return 1 ;
return 0 ;
}
int main(){
char *str1;
int r;
// I ran my code through a debugger again, and it seems that when displaying
// the character stored in ecx is listed as "A" (correct) right before the call
// to "add ecx, 1" at which point ecx somehow resets to 0 when it should be "B"
str1 = strdup("ABC 123 779 Hello World") ;
r = strrepl(str1, '#', &isdigit) ;
printf("str1 = \"%s\"\n", str1) ;
printf("%d chararcters were replaced\n", r) ;
free(str1) ;
return 0;
}
And my .asm file:
; File: strrepl.asm
; Implements a C function with the prototype:
;
; int strrepl(char *str, int c, int (* isinsubset) (int c) ) ;
;
;
; Result: chars in string are replaced with the replacement character and string is returned.
SECTION .text
global strrepl
_strrepl: nop
strrepl:
push ebp ; set up stack frame
mov ebp, esp
push esi ; save registers
push ebx
xor eax, eax
mov ecx, [ebp + 8] ;load string (char array) into ecx
jecxz end ;jump if [ecx] is zero
mov esi, [ebp + 12] ;move the replacement character into esi
mov edx, [ebp + 16] ;move function pointer into edx
xor bl, bl ;bl will be our counter
firstLoop:
add bl, 1 ;inc bl would work too
add ecx, 1
mov eax, [ecx]
cmp eax, 0
jz end
push eax ; parameter for (*isinsubset)
;BREAK
call edx ; execute (*isinsubset)
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
end:
pop ebx ; restore registers
pop esi
mov esp, ebp ; take down stack frame
pop ebp
ret
When running this through gdb and putting a breakpoint at ;BREAK, it segfaults after I take a step to the call command with the following error:
Program received signal SIGSEGV, Segmentation fault.
0x0081320f in isdigit () from /lib/libc.so.6
isdigit is part of the standard c library that i have included in my c file, so I am not sure what to make of this.
Edit: I have edited my firstLoop and included a secondLoop which should replace any digits with "#", however it seems to replace the entire array.
firstLoop:
xor eax, eax
mov edi, [ecx]
cmp edi, 0
jz end
mov edi, ecx ; save array
movzx eax, byte [ecx] ;load single byte into eax
mov ebp, edx ; save function pointer
push eax ; parameter for (*isinsubset)
call edx ; execute (*isinsubset)
;cmp eax, 0
;jne end
mov ecx, edi ; restore array
cmp eax, 0
jne secondLoop
mov edx, ebp ; restore function pointer
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
add ecx, 1
jmp firstLoop
secondLoop:
mov [ecx], esi
mov edx, ebp
add esp, 4
mov ebx, eax
add ecx, 1
jmp firstLoop
Using gdb, when the code gets to secondloop, everything is correct. ecx is showing as "1" which is the first digit in the string that was passed in from the .c file. Esi is displaying as "#" as it should be. However, after I do mov [ecx], esi it seems to fall apart. ecx is displaying as "#" as it should at this point, but once I increment by 1 to get to the next character in the array, it is listed as "/000" with display. Every character after the 1 is replaced with "#" is listed as "/000" with display. Before I had the secondLoop trying to replace the characters with "#", I just had firstLoop looping with it self to see if it could make it through the entire array without crashing. It did, and after each increment ecx was displaying as the correct character. I am not sure why doing mov [ecx], esi would have set the rest of ecx to null.
In your firstLoop: you're loading characters from the string using:
mov eax, [ecx]
which is loading 4 bytes at a tie instead of a single byte. So the int that you're passing to isdigit() is likely to by far out of range for it to handle (it probably uses a simple table lookup).
You can load a single byte using the following Intel asm syntax:
movzx eax, byte ptr [ecx]
A few other things:
it will also have the effect that it probably wouldn't detect the end of the string properly since the null terminator might not be followed by three other zero bytes.
I'm not sure why you increment ecx before processing the first character in the string
the assembly code you posted doesn't appear to actually loop over the string
I've put some comments into your code:-
; this is OK: setting up the stack frame and saving important register
; on Win32, the registers that need saving are: esi, edi and ebx
; the rest can be used without needing to preserve them
push ebp
mov ebp, esp
push esi
push ebx
xor eax, eax
mov ecx, [ebp + 8]
; you said that this checked [ecx] for zero, but I think you've just written
; that wrong, this checks the value of ecx for zero, the [reg] form usually indicates
; the value at the address defined by reg
; so this is effectively doing a null pointer check (which is good)
jecxz end
mov esi, [ebp + 12]
mov edx, [ebp + 16]
xor bl, bl
firstLoop:
add bl, 1
; you increment ecx before loading the first character, this means
; that the function ignores the first character of the string
; and will therefore produce an incorrect result if the string
; starts with a character that needs replacing
add ecx, 1
; characters are 8 bit, not 32 bit (mentioned in comments elsewhere)
mov eax, [ecx]
cmp eax, 0
jz end
push eax
; possibly segfaults due to character out of range
; also, as mentioned elsewhere, the function you call here must conform to the
; the standard calling convention of the system (e.g, preserve esi, edi and ebx for
; Win32 systems), so eax, ecx and edx can change, so next time you call
; [edx] it might be referencing random memory
; either save edx on the stack (push before pushing parameters, pop after add esp)
; or just load edx with [ebp+16] here instead of at the start
call edx
add esp, 4
mov ebx, eax
; more functionality required here!
end:
; restore important values, etc
pop ebx
pop esi
mov esp, ebp
pop ebp
; the result of the function should be in eax, but that's not set up properly yet
ret
Comments on your inner loop:-
firstLoop:
xor eax, eax
; you're loading a 32 bit value and checking for zero,
; strings are terminated with a null character, an 8 bit value,
; not a 32 bit value, so you're reading past the end of the string
; so this is unlikely to correctly test the end of string
mov edi, [ecx]
cmp edi, 0
jz end
mov edi, ecx ; save array
movzx eax, byte [ecx] ;load single byte into eax
; you need to keep ebp! its value must be saved (at the end,
; you do a mov esp,ebp)
mov ebp, edx ; save function pointer
push eax ; parameter for (*isinsubset)
call edx ; execute (*isinsubset)
mov ecx, edi ; restore array
cmp eax, 0
jne secondLoop
mov edx, ebp ; restore function pointer
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
add ecx, 1
jmp firstLoop
secondLoop:
; again, your accessing the string using a 32 bit value, not an 8 bit value
; so you're replacing the matched character and the three next characters
; with the new value
; the upper 24 bits are probably zero so the loop will terminate on the
; next character
; also, the function seems to be returning a count of characters replaced,
; but you're not recording the fact that characters have been replaced
mov [ecx], esi
mov edx, ebp
add esp, 4
mov ebx, eax
add ecx, 1
jmp firstLoop
You do seem to be having trouble with the way the memory works, you are getting confused between 8 bit and 32 bit memory access.

x86 Assembly Lowercase unhandled exception [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
x86 convert to lower case assembly
This program is to convert a 2d char array into lower case
Quickie Edit: I'm using Visual Studio 2010
int b_search (char list[100][20], int count, char* token)
{
__asm
{
mov eax, 0 ; zero out the result
mov esi, list ; move the list pointer to ESI
mov ebx, count ; move the count into EBX
mov edi, token ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
OR [edi], 20h
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
At my OR instruction, where I'm trying to change the register that contains the address to token into all lower case, I keep getting unhandled exception...access violation, and without the brackets nothing gets lowercased. Later in my code I have
LOWERCASE_ARRAY: ;for(edi = 0, edi<ebx; edi++), loops through each name
CMP ecx, ebx
JGE COMPARE
INC ecx ;ecx++
MOV edx, 0; ;edx = 0
LOWERCASE_STRING: ;while next char != 0, loop through each byte to convert to lower case
OR [esi+edx],20h ;change to lower case
INC edx
CMP [esi+edx],0 ;if [esi+edx] not zero, loop again
JNZ LOWERCASE_STRING
JMP LOWERCASE_ARRAY ;jump back to start case change of next name
and the OR instruction there seems to work perfectly so I don't know why the first won't work. Also, I am trying to convert several strings.
After I finish one string, any ideas how I would go about going to the next string (as in list[1][x], list[2][x], etc...) I tried adding 20 as in [esi+20*ecx+edi] but that doesn't work. Can I get advice on how to proceed?
One possibility:
If parameters of procedure b_search are stored as registers (register calling convention) then you override list pointer in your first asm line, because eax point to the list array:
mov eax, 0 ; zero out the result
Because:
mov esi, list ; move the list pointer to ESI
should be converted to:
mov esi, eax
Try to exchange first and second line to:
mov esi, list ; move the list pointer to ESI
mov eax, 0 ; zero out the result

Printing Out a String Stored in an Array of DWORDS

I'm writing a program in Assembly that will Bubble Sort an Array of Strings. A zero length string terminates the array. I approached this by declaring a DWORD array, where the string var., that is a byte size, shall be stored. My main problem is not the bubble sort itself, but that strings that were stored in the array wasn't outputting completely.
To hopefully make it clear, here is my code:
.586
.MODEL FLAT
INCLUDE io.h ; header file for input/output
space equ 0
cr equ 0dh
.STACK 4096
.DATA
myStrings byte "Delts",0
byte "Abs",0
byte "Biceps",0
byte 0
labelStrOut byte "Output is: ", 0
stringOut dword 11 dup (?)
stringNum dword 0
stringArray dword 20 dup (?)
.CODE
_MainProc PROC
mov edi, offset myStrings
mov esi, offset stringArray
popltLp:
cmp BYTE PTR [edi], 0
jz popltDone
mov ebx, [edi]
mov DWORD PTR [esi], ebx
add esi, 4
inc stringNum
xor ecx, ecx
not ecx
xor al, al
repne scasb
jmp popltLp
popltDone:
xor edx, edx
lea esi, stringArray
mov ebx, DWORD PTR [esi]
mov stringOut, ebx
output labelStrOut, stringOut
add esi, 4
mov ebx, DWORD PTR [esi]
mov stringOut, ebx
output labelStrOut, stringOut
add esi, 4
mov ebx, DWORD PTR [esi]
mov stringOut, ebx
output labelStrOut, stringOut
outptDone:
mov eax, 0 ; exit with return code 0
ret
_MainProc ENDP
END ; end of source code
As can be seen, no Bubble Sorting is being done yet...
The lines below 'popltDone' is just me messing around to see if the strings carried over to the array just fine. However, when printed out on the screen, only 4 characters were just showing up! The entire string line was just not being printed out, which is currently driving me crazy. Can someone please tell me what I am doing wrong?
Thanks to anybody taking the time reading this.
The problem is, you aren't using string pointers correctly. Specifically, here's the code I'm referring to:
mov ebx, [edi]
mov DWORD PTR [esi], ebx
If you were to translate this into English, it would be something like this:
Move the 4 byte value pointed to by edi into ebx.
Move the value in ebx into the memory address pointed to by esi.
This is perfectly legal and may actually be what you want in some cases, but I'm guess this isn't one of them. The reason you are only seeing the first 4 characters when you output your array of strings is because you copied the literal string into your array. A DWORD is 4 bytes so you get the first 4 characters. Here's what I would write:
mov DWORD PTR [esi], edi
Which translates into:
Move the pointer value edi into the memory address pointed to by esi.
Now you have not an array of strings, but an array of string pointers. If you were to write your code in C, decompile it, this is most likely what you would see. Rewrite your comparison and output functions to work with the pointer to a string instead of the literal characters in the string and you'll fix your problem.

Resources