Implementing a toupper function in x86 assembly - arrays

I'm playing around with x86 assembly in VS 2012 trying to convert some old code I have to assembly. The problem I'm having is accessing and changing array values (the values are characters) and I'm not sure how to go about it. I've included comments so you can see my thought process
void toUpper(char *string) {
__asm{
PUSH EAX
PUSH EBX
PUSH ECX
PUSH EDX
PUSH ESI
PUSH EDI
MOV EBX, string
MOV ECX, 0 // counter
FOR_EXPR: // for loop
CMP EBX, 0 //compare ebx to 0
JLE END_FOR // if ebx == 0, jump to end_for
CMP EBX, 97 // compare ebx to 97
JL ELSE // if ebx < 97, jump else
CMP EBX, 122 // compare ebx to 122
JG ELSE // if ebx > 122, jump else
// subtract 32 from current array value
// jump to next element
JMP END_IF
ELSE:
// jump to next element
END_IF:
JMP FOR_EXPR
END_FOR:
POP EDI
POP ESI
POP EDX
POP ECX
POP EBX
POP EAX
}
}
Any help is much appreciated!

Looks to me like the basic problem is that you're loading EBX with the address of the string, but then trying to use it as if it contained a byte of data from inside the string.
I'd probably do things a bit differently. I'd probably load the address of the string into ESI and use it to read the contents of the string indirectly.
mov esi, string
next_char:
lodsb
test al, al ; check for end of string
jz done
cmp al, 'a' ; ignore unless in range
bl next_char
cmp al, 'z'
bg next_char
sub al, 'a'-'A' ; convert to upper case
mov [esi-1], al ; write back to string
jmp next_char
You can use EBX for that instead of ESI, but ESI is a lot more idiomatic. There are also some tricks you could use to optimize this a little, but until you understand the basics, they'd mostly add confusion. With a modern processor, they probably wouldn't make much difference anyway--this is likely to run as fast as your bandwidth to memory anyway.

Related

How to write an encryption loop where the plain text and the key are of different lengths

Help me to check my code, I can't get the expected output.
Input: prompt user to insert plain text, and key
Output: calculate and come out with the encrypted text by using vigenère cipher.
I can input the plain text and the key, but I can't get the cipher text output.
INCLUDE IRVINE32.INC
.DATA
BUFMAX = 128
sPrompt1 BYTE "Enter the plain text:", 0
sPrompt2 BYTE "Key:",0
sEncrypt BYTE "Cipher text:",0
buffer BYTE BUFMAX+1 DUP(?)
key BYTE BUFMAX+1 DUP (?)
bufsize DWORD ?
keysize DWORD ?
.CODE
main PROC
call Clrscr
call inputString ;input plain text, key
call translateBuffer ;encrypt the buffer
call displayMessage ;display encrypted message
exit
main ENDP
inputString
inputString PROC
;--------------------------------------------------------------------------------
;Prompts user to input string and key, saves the string and it's length
;recieves: nothing
;returns: nothing
;--------------------------------------------------------------------------------
pushad
mov edx, OFFSET sPrompt1 ;prompt plain text
call WriteString
mov ecx, BUFMAX
mov edx, OFFSET buffer
call ReadString
mov bufsize, eax
call Crlf
mov edx, OFFSET sPrompt2 ;prompt key
call WriteString
mov ecx, BUFMAX
mov edx, OFFSET key
call ReadString
mov keysize, eax
call Crlf
popad
ret
inputString ENDP
translateBuffer
translateBuffer PROC
;--------------------------------------------------------------------------------
;translate the plain text with key to cipher text
;recieves: nothing
;returns: nothing
;--------------------------------------------------------------------------------
pushad
mov ecx, bufsize
mov esi, 0
mov edi, 0
L1: mov al, key[edi]
xor buffer[esi], al
inc esi
inc edi
cmp edi, keysize+1
jne L1
mov edi, 0
loop L1
popad
ret
translateBuffer ENDP
display encrypted message
displayMessage PROC
;--------------------------------------------------------------------------------
; Displays the encrypted message.
; Receives: EDX points to the message
; Returns: nothing
;--------------------------------------------------------------------------------
pushad
mov edx, OFFSET sEncrypt
call WriteString
mov edx,OFFSET buffer
call WriteString
call Crlf
call Crlf
popad
ret
displayMessage ENDP
END main
Your input and output procedures seem fine. The troubles sit in-between, in your encryption procedure.
The user can input strings of varying lengths for the plain text and for the encryption key. Your program needs to consider these different lengths in order to not overflow any buffers, because that's exactly what is happening in your code!
Problem 1
Because the jne L1 instruction, going the wrong way, bypasses the normal loop counting, your loop runs for much too long and starts overwriting memory that does no longer belong to the buffers that were defined.
The loop must only traverse the plain text string. If the text in the encryption key happens to be shorter than the plain text string, then you will want to repeat the key. Therefore you have to reset the offset register EDI to 0, but you have to continu the loop normally (not just return to the top!).
Problem2
A second problem is that you have this cmp edi, keysize+1 instruction that, because of the +1, will be consuming the zero-terminator from the key string as if it were part of that string. That's never the idea behind a string terminator. It's not part of the string.
push esi ; No need to preserve the scratch registers EAX, ECX, EDX
mov esi, OFFSET buffer
xor edx, edx
mov ecx, bufsize
L1: movzx eax, byte ptr key[edx]
xor [esi], al
inc edx
inc esi
cmp edx, keysize
je L3 ; Branches in the least common case
L2: dec ecx
jnz L1
pop esi
ret
L3: xor edx, edx
jmp L2
In the above snippet I have improved the code a bit.
You can zero a register simply by XORing it to itself.
You should never use the LOOP instruction because it is very slow.
And some more optimizations that you can read about in Peter Cordes' comments below this answer.
And this is a version that even eliminates the conditional branch using the CMOVE (Conditional MOVe If Equal) instruction that will store the (zeroed) EDX register over the EDI register if the Equal condition happens to be true. If not, there's no moving at all.
xor edi, edi
xor esi, esi
xor edx, edx
mov ecx, bufsize
L1: mov al, buffer[esi]
xor al, key[edi]
mov buffer[esi], al
inc esi
inc edi
cmp edi, keysize
cmove edi, edx
dec ecx
jnz L1
The danger of XOR
Because the encryption uses a mere XOR operation and depending on the composition of both strings, it's possible to obtain a zero byte amongst the bytes of the encrypted string. Obviously a print function like WriteString that depends on the zero-terminating of a string will consider the first zero as the end of the string. If it so happens that this occurs in the first position of the encrypted string, well then there's nothing at all to print...

getting character from string and using it as array index... ASM

Having trouble using a string array and getting each character from it and adding a 1 to a frequency table of the corresponding ascii index (frequency table is indexed by ascii value): Example, get character 'a' then add 1 to the frequency table of index of the array ['a']. I was getting segmentation errors and now getting error: invalid combination of opcode and operands, talking about mov ax, al
Any questions about the parameters of the problem please ask. I have working on this for hours and could really use another pair of eyes to check what I am doing wrong (syntax/concept if you see one) Please help.
Update: I have got it print stuff out, so I think it is "working"; however I am now trying to print the characters that each array index corresponds. It won't print the character of the array that I am pointing to (it prints literally nothing for the character).
Latest update: I got it to work. changed some of the code under the label .loopa and now it works fine! :)
Code below:
SECTION .data ; Data section, initialized variables
array5: db "Hello, world...", 0
array5Len: equ $-array5-1
asoutput: db "%s", 0 ; string output
newline: db "", 10, 0 ; format for a new line
acoutput: db "%c: ", 0 ; output format for character output
SECTION .bss ; BSS, uninitialized variables
arrayq: resd 128 ; frequency array of the first 127 ascii values initialized to 0 (none have been counted yet)
SECTION .text
global main ; the standard gcc entry point
main: ; the program label for the entry point
push ebp ; set up stack frame
mov ebp,esp
mov esi, array5
mov edi, 0
mov ebx, arrayq
mov ecx, array5Len
; get each character of array5 and add 1 to the frequency table of the corresponding ascii value (which the arrayq is indexed by ascii value).
.loopf:
xor eax, eax
mov al, [esi]
;mov ax, [esi]
;mov ax, al
;mov cx, ax
add edi, eax
mov ebx, 1
add [arrayq+4*edi], ebx
mov edi, 0
add esi, 1
loop .loopf
push dword array2
push dword asoutput
call printf
add esp, 8
push dword newline
call printf
add esp, 4
;pop ebx
mov ebx, arrayq
mov ecx, 128 ; size of arrayq
mov esi, 0 ;start at beginning
.loopa:
mov eax, 0
cmp [ebx+esi], eax
je .skip
mov eax, esi
push ebx
push ecx
mov ebx, 4
cdq
div ebx
push eax
push dword acoutput
call printf
add esp, 8
pop ecx
pop ebx
push ebx
push ecx ; make sure to put ecx (counter) on stack so we don't lose it when calling printf)
push dword [ebx + esi] ; put the value of the array at this (esi) index on the stack to be used by printf
push dword aoutput ; put the array output format on the stack for printf to use
call printf ; call the printf command
add esp, 8 ; add 4 bytes * 2
pop ecx ; get ecx back
pop ebx
push ebx
push ecx
push dword newline
call printf
add esp, 4
pop ecx
pop ebx
.skip:
add esi, 4
loop .loopa
.end:
mov esp, ebp ; takedown stack frame
pop ebp ; same as "leave" op
Changed code under .loopa label to make it print the character the index is corresponding to:
.loopa:
mov eax, 0
cmp [ebx+esi], eax
je .skip
mov eax, esi
push ebx
push ecx
mov ebx, 4
cdq
div ebx
push eax
push dword acoutput
call printf
add esp, 8
pop ecx
pop ebx

Calling C function in Assembly Segfaults

I am trying to write an assembly program that calls a function in c that will replace certain characters in a string with a predefined character given that the currently character in the char array meets some qualification.
My c file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
//display *((char *) $edi)
// These functions will be implemented in assembly:
//
int strrepl(char *str, int c, int (* isinsubset) (int c) ) ;
int isvowel (int c) {
if (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u')
return 1 ;
if (c == 'A' || c == 'E' || c == 'I' || c == 'O' || c == 'U')
return 1 ;
return 0 ;
}
int main(){
char *str1;
int r;
// I ran my code through a debugger again, and it seems that when displaying
// the character stored in ecx is listed as "A" (correct) right before the call
// to "add ecx, 1" at which point ecx somehow resets to 0 when it should be "B"
str1 = strdup("ABC 123 779 Hello World") ;
r = strrepl(str1, '#', &isdigit) ;
printf("str1 = \"%s\"\n", str1) ;
printf("%d chararcters were replaced\n", r) ;
free(str1) ;
return 0;
}
And my .asm file:
; File: strrepl.asm
; Implements a C function with the prototype:
;
; int strrepl(char *str, int c, int (* isinsubset) (int c) ) ;
;
;
; Result: chars in string are replaced with the replacement character and string is returned.
SECTION .text
global strrepl
_strrepl: nop
strrepl:
push ebp ; set up stack frame
mov ebp, esp
push esi ; save registers
push ebx
xor eax, eax
mov ecx, [ebp + 8] ;load string (char array) into ecx
jecxz end ;jump if [ecx] is zero
mov esi, [ebp + 12] ;move the replacement character into esi
mov edx, [ebp + 16] ;move function pointer into edx
xor bl, bl ;bl will be our counter
firstLoop:
add bl, 1 ;inc bl would work too
add ecx, 1
mov eax, [ecx]
cmp eax, 0
jz end
push eax ; parameter for (*isinsubset)
;BREAK
call edx ; execute (*isinsubset)
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
end:
pop ebx ; restore registers
pop esi
mov esp, ebp ; take down stack frame
pop ebp
ret
When running this through gdb and putting a breakpoint at ;BREAK, it segfaults after I take a step to the call command with the following error:
Program received signal SIGSEGV, Segmentation fault.
0x0081320f in isdigit () from /lib/libc.so.6
isdigit is part of the standard c library that i have included in my c file, so I am not sure what to make of this.
Edit: I have edited my firstLoop and included a secondLoop which should replace any digits with "#", however it seems to replace the entire array.
firstLoop:
xor eax, eax
mov edi, [ecx]
cmp edi, 0
jz end
mov edi, ecx ; save array
movzx eax, byte [ecx] ;load single byte into eax
mov ebp, edx ; save function pointer
push eax ; parameter for (*isinsubset)
call edx ; execute (*isinsubset)
;cmp eax, 0
;jne end
mov ecx, edi ; restore array
cmp eax, 0
jne secondLoop
mov edx, ebp ; restore function pointer
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
add ecx, 1
jmp firstLoop
secondLoop:
mov [ecx], esi
mov edx, ebp
add esp, 4
mov ebx, eax
add ecx, 1
jmp firstLoop
Using gdb, when the code gets to secondloop, everything is correct. ecx is showing as "1" which is the first digit in the string that was passed in from the .c file. Esi is displaying as "#" as it should be. However, after I do mov [ecx], esi it seems to fall apart. ecx is displaying as "#" as it should at this point, but once I increment by 1 to get to the next character in the array, it is listed as "/000" with display. Every character after the 1 is replaced with "#" is listed as "/000" with display. Before I had the secondLoop trying to replace the characters with "#", I just had firstLoop looping with it self to see if it could make it through the entire array without crashing. It did, and after each increment ecx was displaying as the correct character. I am not sure why doing mov [ecx], esi would have set the rest of ecx to null.
In your firstLoop: you're loading characters from the string using:
mov eax, [ecx]
which is loading 4 bytes at a tie instead of a single byte. So the int that you're passing to isdigit() is likely to by far out of range for it to handle (it probably uses a simple table lookup).
You can load a single byte using the following Intel asm syntax:
movzx eax, byte ptr [ecx]
A few other things:
it will also have the effect that it probably wouldn't detect the end of the string properly since the null terminator might not be followed by three other zero bytes.
I'm not sure why you increment ecx before processing the first character in the string
the assembly code you posted doesn't appear to actually loop over the string
I've put some comments into your code:-
; this is OK: setting up the stack frame and saving important register
; on Win32, the registers that need saving are: esi, edi and ebx
; the rest can be used without needing to preserve them
push ebp
mov ebp, esp
push esi
push ebx
xor eax, eax
mov ecx, [ebp + 8]
; you said that this checked [ecx] for zero, but I think you've just written
; that wrong, this checks the value of ecx for zero, the [reg] form usually indicates
; the value at the address defined by reg
; so this is effectively doing a null pointer check (which is good)
jecxz end
mov esi, [ebp + 12]
mov edx, [ebp + 16]
xor bl, bl
firstLoop:
add bl, 1
; you increment ecx before loading the first character, this means
; that the function ignores the first character of the string
; and will therefore produce an incorrect result if the string
; starts with a character that needs replacing
add ecx, 1
; characters are 8 bit, not 32 bit (mentioned in comments elsewhere)
mov eax, [ecx]
cmp eax, 0
jz end
push eax
; possibly segfaults due to character out of range
; also, as mentioned elsewhere, the function you call here must conform to the
; the standard calling convention of the system (e.g, preserve esi, edi and ebx for
; Win32 systems), so eax, ecx and edx can change, so next time you call
; [edx] it might be referencing random memory
; either save edx on the stack (push before pushing parameters, pop after add esp)
; or just load edx with [ebp+16] here instead of at the start
call edx
add esp, 4
mov ebx, eax
; more functionality required here!
end:
; restore important values, etc
pop ebx
pop esi
mov esp, ebp
pop ebp
; the result of the function should be in eax, but that's not set up properly yet
ret
Comments on your inner loop:-
firstLoop:
xor eax, eax
; you're loading a 32 bit value and checking for zero,
; strings are terminated with a null character, an 8 bit value,
; not a 32 bit value, so you're reading past the end of the string
; so this is unlikely to correctly test the end of string
mov edi, [ecx]
cmp edi, 0
jz end
mov edi, ecx ; save array
movzx eax, byte [ecx] ;load single byte into eax
; you need to keep ebp! its value must be saved (at the end,
; you do a mov esp,ebp)
mov ebp, edx ; save function pointer
push eax ; parameter for (*isinsubset)
call edx ; execute (*isinsubset)
mov ecx, edi ; restore array
cmp eax, 0
jne secondLoop
mov edx, ebp ; restore function pointer
add esp, 4 ; "pop off" the parameter
mov ebx, eax ; store return value
add ecx, 1
jmp firstLoop
secondLoop:
; again, your accessing the string using a 32 bit value, not an 8 bit value
; so you're replacing the matched character and the three next characters
; with the new value
; the upper 24 bits are probably zero so the loop will terminate on the
; next character
; also, the function seems to be returning a count of characters replaced,
; but you're not recording the fact that characters have been replaced
mov [ecx], esi
mov edx, ebp
add esp, 4
mov ebx, eax
add ecx, 1
jmp firstLoop
You do seem to be having trouble with the way the memory works, you are getting confused between 8 bit and 32 bit memory access.

x86 Assembly Lowercase unhandled exception [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
x86 convert to lower case assembly
This program is to convert a 2d char array into lower case
Quickie Edit: I'm using Visual Studio 2010
int b_search (char list[100][20], int count, char* token)
{
__asm
{
mov eax, 0 ; zero out the result
mov esi, list ; move the list pointer to ESI
mov ebx, count ; move the count into EBX
mov edi, token ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
OR [edi], 20h
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
At my OR instruction, where I'm trying to change the register that contains the address to token into all lower case, I keep getting unhandled exception...access violation, and without the brackets nothing gets lowercased. Later in my code I have
LOWERCASE_ARRAY: ;for(edi = 0, edi<ebx; edi++), loops through each name
CMP ecx, ebx
JGE COMPARE
INC ecx ;ecx++
MOV edx, 0; ;edx = 0
LOWERCASE_STRING: ;while next char != 0, loop through each byte to convert to lower case
OR [esi+edx],20h ;change to lower case
INC edx
CMP [esi+edx],0 ;if [esi+edx] not zero, loop again
JNZ LOWERCASE_STRING
JMP LOWERCASE_ARRAY ;jump back to start case change of next name
and the OR instruction there seems to work perfectly so I don't know why the first won't work. Also, I am trying to convert several strings.
After I finish one string, any ideas how I would go about going to the next string (as in list[1][x], list[2][x], etc...) I tried adding 20 as in [esi+20*ecx+edi] but that doesn't work. Can I get advice on how to proceed?
One possibility:
If parameters of procedure b_search are stored as registers (register calling convention) then you override list pointer in your first asm line, because eax point to the list array:
mov eax, 0 ; zero out the result
Because:
mov esi, list ; move the list pointer to ESI
should be converted to:
mov esi, eax
Try to exchange first and second line to:
mov esi, list ; move the list pointer to ESI
mov eax, 0 ; zero out the result

x86 convert to lower case assembly

This program is to convert a char pointer into lower case. I'm using Visual Studio 2010.
This is from another question, but much simpler to read and more direct to the point.
int b_search (char* token)
{
__asm
{
mov eax, 0 ; zero out the result
mov edi, [token] ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
OR [edi], 20h
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
At my OR instruction, where I'm trying to change the register that contains the address to token into all lower case, I keep getting unhandled exception...access violation, and without the brackets nothing, I don't get errors but nothing gets lowercased. Any advice?
This is part of some bigger code from another question, but I broke it down because I needed this solution only.
Your code can alter only the first char (or [edi], 20h) - the EDI does not increment.
EDIT: found this thread with workaround. Try using the 'dl' instead of al.
; move the token address to search for into EDI
; (not the *token, as would be with mov edi, [token])
mov edi, token
LOWERCASE_TOKEN: ;lowercase the token
mov al, [edi]
; check for null-terminator here !
cmp al, 0
je GET_OUT
or al, 20h
mov dl, al
mov [edi], dl
inc edi
jmp LOWERCASE_TOKEN
GET_OUT:
I would load the data into a register, manipulate it there, then store the result back to memory.
int make_lower(char* token) {
__asm {
mov edi, token
jmp short start_loop
top_loop:
or al, 20h
mov [edi], al
inc edi
start_loop:
mov al, [edi]
test al, al
jnz top_loop
}
}
Note, however, that your conversion to upper-case is somewhat flawed. For example, if the input contains any control characters, it will change them to something else -- but they aren't upper case, and what it converts them to won't be lower case.
The problem is, that the OR operator like many others don't allow two memory or constant parameters. That means: The OR operator can only have following parameters:
OR register, memory
OR register, register
OR register, constant
The second problem is, that the OR has to store the result to a register, not to memory.
Thats why you get an access violation, when the brackets are set. If you remove the brackets, the parameters are ok, but you don't write your lowercase letter to memory, what you intend to do. So use another register, to copy the letter to, and then use OR.
For example:
mov eax, 0 ; zero out the result
mov edi, [token] ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
MOV ebx, [edi] ;## Copy the value to another register ##
OR ebx, 20h ;## and compare now the register and the memory ##
MOV [edi], ebx ;##Save back the result ##
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
That should work^^

Resources