x86 Assembly Lowercase unhandled exception [duplicate]

x86 Assembly Lowercase unhandled exception [duplicate] - c

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
x86 convert to lower case assembly
This program is to convert a 2d char array into lower case
Quickie Edit: I'm using Visual Studio 2010
int b_search (char list[100][20], int count, char* token)
{
__asm
{
mov eax, 0 ; zero out the result
mov esi, list ; move the list pointer to ESI
mov ebx, count ; move the count into EBX
mov edi, token ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
OR [edi], 20h
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
At my OR instruction, where I'm trying to change the register that contains the address to token into all lower case, I keep getting unhandled exception...access violation, and without the brackets nothing gets lowercased. Later in my code I have
LOWERCASE_ARRAY: ;for(edi = 0, edi<ebx; edi++), loops through each name
CMP ecx, ebx
JGE COMPARE
INC ecx ;ecx++
MOV edx, 0; ;edx = 0
LOWERCASE_STRING: ;while next char != 0, loop through each byte to convert to lower case
OR [esi+edx],20h ;change to lower case
INC edx
CMP [esi+edx],0 ;if [esi+edx] not zero, loop again
JNZ LOWERCASE_STRING
JMP LOWERCASE_ARRAY ;jump back to start case change of next name
and the OR instruction there seems to work perfectly so I don't know why the first won't work. Also, I am trying to convert several strings.
After I finish one string, any ideas how I would go about going to the next string (as in list[1][x], list[2][x], etc...) I tried adding 20 as in [esi+20*ecx+edi] but that doesn't work. Can I get advice on how to proceed?

One possibility:
If parameters of procedure b_search are stored as registers (register calling convention) then you override list pointer in your first asm line, because eax point to the list array:
mov eax, 0 ; zero out the result
Because:
mov esi, list ; move the list pointer to ESI
should be converted to:
mov esi, eax
Try to exchange first and second line to:
mov esi, list ; move the list pointer to ESI
mov eax, 0 ; zero out the result

Related

Array access in MASM

I'm having some difficulty with my array accessing in MASM. I've got a very large array and a temp variable like so:
.data
array DWORD 65000 DUP (0)
temp DWORD 0
In my main, I've got this to fill it:
mov esi, offset array
mov edi, 0
mov ecx, 0
fill:
mov [esi], ecx
add esi, 4
inc ecx
cmp ecx, 65000
jl fill
mov esi, offset array ;reset the array location to the start
After, I want to access the array with this loop:
mark:
mov temp, 4 ;get another 4 to temp to move along the array
add esi, temp ;add temp to esi to keep moving
mov edx, [esi] ;access the current value
cmp esi, 20 ;just trying to get first few elements
jmp mark
exit
main ENDP
END main
I always have an access violation error, with the break at the line where I try to access the current value. This occurs on the very first loop as well. Any idea why this is happening? Thanks!

Your code will work to an extent, however:
mark:
mov temp, 4
add esi, temp ;Incrementing before first loop means you miss the first stored value
mov edx, [esi]
cmp esi, 20 ;ESI is the address your are accessing, not a count
jmp mark ;This loop will be infinite

Implementing a toupper function in x86 assembly

I'm playing around with x86 assembly in VS 2012 trying to convert some old code I have to assembly. The problem I'm having is accessing and changing array values (the values are characters) and I'm not sure how to go about it. I've included comments so you can see my thought process
void toUpper(char *string) {
__asm{
PUSH EAX
PUSH EBX
PUSH ECX
PUSH EDX
PUSH ESI
PUSH EDI
MOV EBX, string
MOV ECX, 0 // counter
FOR_EXPR: // for loop
CMP EBX, 0 //compare ebx to 0
JLE END_FOR // if ebx == 0, jump to end_for
CMP EBX, 97 // compare ebx to 97
JL ELSE // if ebx < 97, jump else
CMP EBX, 122 // compare ebx to 122
JG ELSE // if ebx > 122, jump else
// subtract 32 from current array value
// jump to next element
JMP END_IF
ELSE:
// jump to next element
END_IF:
JMP FOR_EXPR
END_FOR:
POP EDI
POP ESI
POP EDX
POP ECX
POP EBX
POP EAX
}
}
Any help is much appreciated!

Looks to me like the basic problem is that you're loading EBX with the address of the string, but then trying to use it as if it contained a byte of data from inside the string.
I'd probably do things a bit differently. I'd probably load the address of the string into ESI and use it to read the contents of the string indirectly.
mov esi, string
next_char:
lodsb
test al, al ; check for end of string
jz done
cmp al, 'a' ; ignore unless in range
bl next_char
cmp al, 'z'
bg next_char
sub al, 'a'-'A' ; convert to upper case
mov [esi-1], al ; write back to string
jmp next_char
You can use EBX for that instead of ESI, but ESI is a lot more idiomatic. There are also some tricks you could use to optimize this a little, but until you understand the basics, they'd mostly add confusion. With a modern processor, they probably wouldn't make much difference anyway--this is likely to run as fast as your bandwidth to memory anyway.

Calling C function in Assembly Segfaults

I am trying to write an assembly program that calls a function in c that will replace certain characters in a string with a predefined character given that the currently character in the char array meets some qualification.
My c file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
//display *((char *) $edi)
// These functions will be implemented in assembly:
//
int strrepl(char *str, int c, int (* isinsubset) (int c) ) ;
int isvowel (int c) {
if (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u')
return 1 ;
if (c == 'A' || c == 'E' || c == 'I' || c == 'O' || c == 'U')
return 1 ;
return 0 ;
}
int main(){
char *str1;
int r;
// I ran my code through a debugger again, and it seems that when displaying
// the character stored in ecx is listed as "A" (correct) right before the call
// to "add ecx, 1" at which point ecx somehow resets to 0 when it should be "B"
str1 = strdup("ABC 123 779 Hello World") ;
r = strrepl(str1, '#', &isdigit) ;
printf("str1 = \"%s\"\n", str1) ;
printf("%d chararcters were replaced\n", r) ;
free(str1) ;
return 0;
}
And my .asm file:
; File: strrepl.asm
; Implements a C function with the prototype:
;
; int strrepl(char *str, int c, int (* isinsubset) (int c) ) ;
;
;
; Result: chars in string are replaced with the replacement character and string is returned.
SECTION .text
global strrepl
_strrepl: nop
strrepl:
push ebp ; set up stack frame
mov ebp, esp
push esi ; save registers
push ebx
xor eax, eax
mov ecx, [ebp + 8] ;load string (char array) into ecx
jecxz end ;jump if [ecx] is zero
mov esi, [ebp + 12] ;move the replacement character into esi
mov edx, [ebp + 16] ;move function pointer into edx
xor bl, bl ;bl will be our counter
firstLoop:
add bl, 1 ;inc bl would work too
add ecx, 1
mov eax, [ecx]
cmp eax, 0
jz end
push eax ; parameter for (*isinsubset)
;BREAK
call edx ; execute (*isinsubset)
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
end:
pop ebx ; restore registers
pop esi
mov esp, ebp ; take down stack frame
pop ebp
ret
When running this through gdb and putting a breakpoint at ;BREAK, it segfaults after I take a step to the call command with the following error:
Program received signal SIGSEGV, Segmentation fault.
0x0081320f in isdigit () from /lib/libc.so.6
isdigit is part of the standard c library that i have included in my c file, so I am not sure what to make of this.
Edit: I have edited my firstLoop and included a secondLoop which should replace any digits with "#", however it seems to replace the entire array.
firstLoop:
xor eax, eax
mov edi, [ecx]
cmp edi, 0
jz end
mov edi, ecx ; save array
movzx eax, byte [ecx] ;load single byte into eax
mov ebp, edx ; save function pointer
push eax ; parameter for (*isinsubset)
call edx ; execute (*isinsubset)
;cmp eax, 0
;jne end
mov ecx, edi ; restore array
cmp eax, 0
jne secondLoop
mov edx, ebp ; restore function pointer
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
add ecx, 1
jmp firstLoop
secondLoop:
mov [ecx], esi
mov edx, ebp
add esp, 4
mov ebx, eax
add ecx, 1
jmp firstLoop
Using gdb, when the code gets to secondloop, everything is correct. ecx is showing as "1" which is the first digit in the string that was passed in from the .c file. Esi is displaying as "#" as it should be. However, after I do mov [ecx], esi it seems to fall apart. ecx is displaying as "#" as it should at this point, but once I increment by 1 to get to the next character in the array, it is listed as "/000" with display. Every character after the 1 is replaced with "#" is listed as "/000" with display. Before I had the secondLoop trying to replace the characters with "#", I just had firstLoop looping with it self to see if it could make it through the entire array without crashing. It did, and after each increment ecx was displaying as the correct character. I am not sure why doing mov [ecx], esi would have set the rest of ecx to null.

In your firstLoop: you're loading characters from the string using:
mov eax, [ecx]
which is loading 4 bytes at a tie instead of a single byte. So the int that you're passing to isdigit() is likely to by far out of range for it to handle (it probably uses a simple table lookup).
You can load a single byte using the following Intel asm syntax:
movzx eax, byte ptr [ecx]
A few other things:
it will also have the effect that it probably wouldn't detect the end of the string properly since the null terminator might not be followed by three other zero bytes.
I'm not sure why you increment ecx before processing the first character in the string
the assembly code you posted doesn't appear to actually loop over the string

I've put some comments into your code:-
; this is OK: setting up the stack frame and saving important register
; on Win32, the registers that need saving are: esi, edi and ebx
; the rest can be used without needing to preserve them
push ebp
mov ebp, esp
push esi
push ebx
xor eax, eax
mov ecx, [ebp + 8]
; you said that this checked [ecx] for zero, but I think you've just written
; that wrong, this checks the value of ecx for zero, the [reg] form usually indicates
; the value at the address defined by reg
; so this is effectively doing a null pointer check (which is good)
jecxz end
mov esi, [ebp + 12]
mov edx, [ebp + 16]
xor bl, bl
firstLoop:
add bl, 1
; you increment ecx before loading the first character, this means
; that the function ignores the first character of the string
; and will therefore produce an incorrect result if the string
; starts with a character that needs replacing
add ecx, 1
; characters are 8 bit, not 32 bit (mentioned in comments elsewhere)
mov eax, [ecx]
cmp eax, 0
jz end
push eax
; possibly segfaults due to character out of range
; also, as mentioned elsewhere, the function you call here must conform to the
; the standard calling convention of the system (e.g, preserve esi, edi and ebx for
; Win32 systems), so eax, ecx and edx can change, so next time you call
; [edx] it might be referencing random memory
; either save edx on the stack (push before pushing parameters, pop after add esp)
; or just load edx with [ebp+16] here instead of at the start
call edx
add esp, 4
mov ebx, eax
; more functionality required here!
end:
; restore important values, etc
pop ebx
pop esi
mov esp, ebp
pop ebp
; the result of the function should be in eax, but that's not set up properly yet
ret
Comments on your inner loop:-
firstLoop:
xor eax, eax
; you're loading a 32 bit value and checking for zero,
; strings are terminated with a null character, an 8 bit value,
; not a 32 bit value, so you're reading past the end of the string
; so this is unlikely to correctly test the end of string
mov edi, [ecx]
cmp edi, 0
jz end
mov edi, ecx ; save array
movzx eax, byte [ecx] ;load single byte into eax
; you need to keep ebp! its value must be saved (at the end,
; you do a mov esp,ebp)
mov ebp, edx ; save function pointer
push eax ; parameter for (*isinsubset)
call edx ; execute (*isinsubset)
mov ecx, edi ; restore array
cmp eax, 0
jne secondLoop
mov edx, ebp ; restore function pointer
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
add ecx, 1
jmp firstLoop
secondLoop:
; again, your accessing the string using a 32 bit value, not an 8 bit value
; so you're replacing the matched character and the three next characters
; with the new value
; the upper 24 bits are probably zero so the loop will terminate on the
; next character
; also, the function seems to be returning a count of characters replaced,
; but you're not recording the fact that characters have been replaced
mov [ecx], esi
mov edx, ebp
add esp, 4
mov ebx, eax
add ecx, 1
jmp firstLoop
You do seem to be having trouble with the way the memory works, you are getting confused between 8 bit and 32 bit memory access.

Assembler array max element search

I need to write asm function in Delphi to search for max array element. So that wat I wrote.
Got few prolbems here.
First - mov ecx, len just dosen't work in right way here. Actually it replaces value in ECX but not with value in len! And if I just wirte an example mov ecx, 5 there appears 5 in ecx.
Second - i test this function on array of 5 elements (using mov ecx, 5 ofc ) it returns some strange result. I think maybe because of I do someting worng when trying to read arrays 0 element like this
mov edx, arr
lea ebx, dword ptr [edx]
But if I read it like this
lea ebx, arr
it says that operation is invalid and if I try like this
lea bx, arr
it says that sizes mismatch.
How could I solve this problem? Full code here:
program Project2;
{$APPTYPE CONSOLE}
uses
SysUtils;
Type
TMyArray = Array [0..255] Of Byte;
function randArrCreate(len:Integer):TMyArray;
var temp:TMyArray; i:Integer;
begin
Randomize;
for i:=0 to len-1 do
temp[i]:=Random(100);
Result:= temp;
end;
procedure arrLoop(arr:TMyArray; len:Integer);
var i:integer;
begin
for i:=0 to len-1 do begin
Write(' ');
Write(arr[i]);
Write(' ');
end;
end;
function arrMaxAsm(arr:TMyArray; len:integer):Word; assembler;
asm
mov edx, arr
lea ebx, dword ptr [edx]
mov ecx, len
xor ax,ax //0
mov ax, [ebx] //max
#cycle:
mov dx, [ebx]
cmp dx, ax
jg #change
jmp #cont
#change:
mov ax, dx
#cont:
inc ebx
loop #cycle
mov result, ax
end;
var massive:TMyArray; n,res:Integer;
begin
Readln(n);
massive:=randArrCreate(n);//just create random array
arrLoop(massive,n);//just to show what in it
res:=arrMaxAsm(massive, n);
Writeln(res);
Readln(n);
end.

First off, calling conventions: what data is sent to the function and where?
According to the documentation, arrays are passed as 32-bit pointers to the data, and integers are passed as values.
According to the same documentation, multiple calling conventions are supported. Unfortunately, the default one isn't documented - explicitly specifying one would be a good idea.
Based on your description that mov ecx, len doesn't work, I'm guessing the compiler used the register convention by default, and the arguments were already placed in ecx and edx, then your code went and mixed them up. You can either change your code to work with that convention, or tell the compiler to pass the arguments using the stack - use the stdcall convention. I arbitrarily picked the second option. Whichever one you pick, make sure to specify the calling convention explicitly.
Next, actual function logic.
Is there a reason why you're working with 16 bit registers instead of full 32-bit ones?
Your array contains bytes, but you're reading and comparing words.
lea ebx, dword ptr [edx] is the same as mov ebx, edx. You're just introducing another temporary variable.
You're comparing elements as if they were signed.
Modern compilers tend to implement loops without using loop.
The documentation also says that ebx needs to be preserved - because the function uses ebx, its original value needs to be saved at the start and restored afterwards.
This is how I rewrote your function (using Lazarus, because I haven't touched Delphi in about 8 years - no compiler within reach):
function arrMaxAsm(arr:TMyArray; len:integer):Word; assembler; stdcall;
asm
push ebx { save ebx }
lea edx, arr { Lazarus accepts a simple "mov edx, arr" }
mov edx, [edx] { but Delphi 7 requires this indirection }
mov ecx, len
xor ax, ax { set default max to 0 }
test ecx, ecx
jle #done { if len is <= 0, nothing to do }
movzx ax, byte ptr [edx] { read a byte, zero-extend it to a word }
{ and set it as current max }
#cont:
dec ecx
jz #done { if no elements left, return current max }
#cycle:
inc edx
movzx bx, byte ptr [edx] { read next element, zero-extend it }
cmp bx, ax { compare against current max as unsigned quantities }
jbe #cont
mov ax, bx
jmp #cont
#done:
pop ebx { restore saved ebx }
mov result, ax
end;
It might be possible to optimize it further by reorganizing the loop jumps - YMMV.
Note: this will only work correctly for byte-sized unsigned values. To adapt it to values of different size/signedness, some changes need to be made:
Data size:
Read the right amount of bytes:
movzx bx, byte ptr [edx] { byte-sized values }
mov bx, word ptr [edx] { word-sized values }
mov ebx, dword ptr [edx] { dword-sized values }
{ note that the full ebx is needed to store this value... }
Mind that this reading is done in two places. If you're dealing with dwords, you'll also need to change the result from ax to eax.
Advance over the right amount of bytes.
#cycle:
inc edx { for an array of bytes }
add edx, 2 { for an array of words }
add edx, 4 { for an array of dwords }
Dealing with signed values:
The value extension, if it's applied, needs to be changed from movzx to movsx.
The conditional jump before setting new maximum needs to be adjusted:
cmp bx, ax { compare against current max as unsigned quantities }
jbe #cont
cmp bx, ax { compare against current max as signed quantities }
jle #cont

x86 convert to lower case assembly

This program is to convert a char pointer into lower case. I'm using Visual Studio 2010.
This is from another question, but much simpler to read and more direct to the point.
int b_search (char* token)
{
__asm
{
mov eax, 0 ; zero out the result
mov edi, [token] ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
OR [edi], 20h
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
At my OR instruction, where I'm trying to change the register that contains the address to token into all lower case, I keep getting unhandled exception...access violation, and without the brackets nothing, I don't get errors but nothing gets lowercased. Any advice?
This is part of some bigger code from another question, but I broke it down because I needed this solution only.

Your code can alter only the first char (or [edi], 20h) - the EDI does not increment.
EDIT: found this thread with workaround. Try using the 'dl' instead of al.
; move the token address to search for into EDI
; (not the *token, as would be with mov edi, [token])
mov edi, token
LOWERCASE_TOKEN: ;lowercase the token
mov al, [edi]
; check for null-terminator here !
cmp al, 0
je GET_OUT
or al, 20h
mov dl, al
mov [edi], dl
inc edi
jmp LOWERCASE_TOKEN
GET_OUT:

I would load the data into a register, manipulate it there, then store the result back to memory.
int make_lower(char* token) {
__asm {
mov edi, token
jmp short start_loop
top_loop:
or al, 20h
mov [edi], al
inc edi
start_loop:
mov al, [edi]
test al, al
jnz top_loop
}
}
Note, however, that your conversion to upper-case is somewhat flawed. For example, if the input contains any control characters, it will change them to something else -- but they aren't upper case, and what it converts them to won't be lower case.

The problem is, that the OR operator like many others don't allow two memory or constant parameters. That means: The OR operator can only have following parameters:
OR register, memory
OR register, register
OR register, constant
The second problem is, that the OR has to store the result to a register, not to memory.
Thats why you get an access violation, when the brackets are set. If you remove the brackets, the parameters are ok, but you don't write your lowercase letter to memory, what you intend to do. So use another register, to copy the letter to, and then use OR.
For example:
mov eax, 0 ; zero out the result
mov edi, [token] ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
MOV ebx, [edi] ;## Copy the value to another register ##
OR ebx, 20h ;## and compare now the register and the memory ##
MOV [edi], ebx ;##Save back the result ##
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
That should work^^