X86 inline assembly, writing into C array - arrays

Assembly info: Using Visual Studio 2010 to write inline assembly embedded into C
Hello,
I am trying to write into an array of chars in C and trying to mimic the action of this C code:
resNum[posNum3] = currNum3 + '0';
currently this is what i have:
mov ebx, posNum3;
mov resNum[ebx], edx; //edx is currNum3
add resNum[ebx], 48; // add 48 because thats the value of the char '0'
I also tried doing this:
mov ebx, posNum3;
mov eax, resNum[ebx] ;// eax points to the beggining of the string
mov eax, edx; // = currnum3
add eax, 48; // + '0'
No luck with any of this, help is more than appreciated!

The problem is the instruction
mov resNum[ebx], edx
moves 4 bytes (an entire dword) into the destination, not a single byte. You probably want
mov byte ptr resNum[ebx], dl
instead. While the assembler will allow you to leave off the 'size ptr' prefix on the address, you probably don't want to, as getting it wrong leads to hard to see bugs.

My X86 asm is rusty, but...
If you're using characters (8 bits) you need to first, before you start a loop, zero out EAX and then move the char into AH or AL, something like:
; Before you start your loop
xor EAX, EAX ; If you're sure you won't overflow an 8 bit number by adding 48, this can go outside the loop
; ... code here to setup loop
mov EBX, posNum3
mov AL, resNum[EBX]
add AL, 48
; ... rest of loop
Note that the compiler will do a better job of this than you will... Unless you're Mike Abrash or someone like him :)

Avoid using expressions like
mov resNum[ebx], edx;
because you never know what is resNum. It could be an expression like esp + 4, and there is no opcode for mov [esp + ebx + 4], edx, so use small steps instead.
Also, ebx is a register that have to be preserved across calls. See http://msdn.microsoft.com/en-us/library/k1a8ss06%28v=VS.71%29.aspx for details and learn about calling conventions.

Most of inline assemblers allows using name instead of size ptr [name], so you can just write
mov al, currNum3
add al, 0x30 //'0'
mov edx, posNum3
mov ecx, resNum
mov byte ptr [edx+ecx], al
if resNum is a global array, not an function argument or local variable, you can write shorter code:
mov al, currNum3
add al, 0x30 //'0'
mov edx, posNum3
mov byte ptr [resNum+ecx], al

Related

Simple x86 Assembly Loop- Using PTR

I'm learning x86 assembly and loops are very confusing to me.
For the prompt: "Write a program that uses the variables below and MOV instructions to copy the value from bigEndian to littleEndian, reversing the BYTE order.You will need to use PTR or LABLE to access just a BYTE of the
DWORD element, and use LOOP (set ECX to 4) and ESI and EDI for indirect
addressing."
My code displays 76993356
Should I be using PTR with bigEndian instead of just looping like this?
INCLUDE Irvine32.inc
.data
; declare variables here
bigEndian DWORD 12345678h
littleEndian DWORD 0
.code
main proc
mov ECX, SIZEOF bigEndian
mov EDI, OFFSET littleEndian
mov ESI, OFFSET bigEndian
TOP:
mov al, [ESI]
mov [EDI], al
inc ESI
dec EDI
loop TOP
mov edx, littleEndian
call WriteHex
exit
main ENDP
END main

Implementing a toupper function in x86 assembly

I'm playing around with x86 assembly in VS 2012 trying to convert some old code I have to assembly. The problem I'm having is accessing and changing array values (the values are characters) and I'm not sure how to go about it. I've included comments so you can see my thought process
void toUpper(char *string) {
__asm{
PUSH EAX
PUSH EBX
PUSH ECX
PUSH EDX
PUSH ESI
PUSH EDI
MOV EBX, string
MOV ECX, 0 // counter
FOR_EXPR: // for loop
CMP EBX, 0 //compare ebx to 0
JLE END_FOR // if ebx == 0, jump to end_for
CMP EBX, 97 // compare ebx to 97
JL ELSE // if ebx < 97, jump else
CMP EBX, 122 // compare ebx to 122
JG ELSE // if ebx > 122, jump else
// subtract 32 from current array value
// jump to next element
JMP END_IF
ELSE:
// jump to next element
END_IF:
JMP FOR_EXPR
END_FOR:
POP EDI
POP ESI
POP EDX
POP ECX
POP EBX
POP EAX
}
}
Any help is much appreciated!
Looks to me like the basic problem is that you're loading EBX with the address of the string, but then trying to use it as if it contained a byte of data from inside the string.
I'd probably do things a bit differently. I'd probably load the address of the string into ESI and use it to read the contents of the string indirectly.
mov esi, string
next_char:
lodsb
test al, al ; check for end of string
jz done
cmp al, 'a' ; ignore unless in range
bl next_char
cmp al, 'z'
bg next_char
sub al, 'a'-'A' ; convert to upper case
mov [esi-1], al ; write back to string
jmp next_char
You can use EBX for that instead of ESI, but ESI is a lot more idiomatic. There are also some tricks you could use to optimize this a little, but until you understand the basics, they'd mostly add confusion. With a modern processor, they probably wouldn't make much difference anyway--this is likely to run as fast as your bandwidth to memory anyway.

I am dealing with a possible array in assembly, but I cannot figure out what the starter value is

Size contains the number 86.
var_10= dword ptr -10h
var_C= dword ptr -0Ch
size= dword ptr 8
push ebp
mov ebp, esp
sub esp, 28h
mov eax, [ebp+size]
mov [esp], eax ; size
call _malloc
mov ds:x, eax
mov [ebp+var_C], 0
jmp short loc_804889E
loc_804889E: ~~~~~~~~~~~~~~~~~~~~~
mov eax, [ebp+size]
sub eax, 1
cmp eax, [ebp+var_C]
jg short loc_8048887
loc_8048887: ~~~~~~~~~~~~~~~~~~~~~
mov edx, ds:x
mov eax, [ebp+var_C]
add edx, eax
mov eax, [ebp+var_C]
add eax, 16h
mov [edx], al
add [ebp+var_C], 1
I am having difficulties reversing this portion of a project I am working on. There's a portion of the code where ds:x is moved into edx and is added with var_c and I am unsure where to go with that.
To me the program looks like it calls malloc and then moves that into ds:x and then moves 0 to var_c.
After that it simply subtracts 1 from the size of my pointer array and compares that number to 0, then jumps to a portion where it adds ds:x into edx so it can add eax to edx.
Am I dealing with some sort of array here? What is the first value that's going to go into edx in loc_8048887? Another way this could help would be to see a C equivalent of it... But that would be what I am trying to accomplish and would rather learn the solution through a different means.
Thank you!
In x86 assembly there's no strict distinction between a variable stored in memory and an array in memory. It only depends on how you access the memory region. All you have is code and data. Anyway, I'd say that ds:x is an array as because of this code here:
mov edx, ds:x ; edx = [x]
mov eax, [ebp+var_C] ; eax = something
add edx, eax ; edx = [x] + something
mov eax, [ebp+var_C] ; eax = something
add eax, 16h ; eax = something + 0x16
mov [edx], al ; [[x] + something ] = al . Yes, ds:x is an array!
What is the value of edx in loc_8048887? To find it out you only need some very basic debugging skills. I assume you have gdb at hand, if not, get it ASAP. Then compile the code with debug symbols and link it, then run gdb with the executable, set a code breakpoint at loc_8048887, run the program with r, and finally check the value of edx.
These are the commands you need:
gdb myexecutable
(gdb) b loc_8048887
(gdb) r
(gdb) info registers edx

moving an array element to a register assembly

My assignment is to find the smallest letter in the array using assembly embedded into C. I am not sure how to access each element of the array. I tried googling and I found out that some people are doing the following:
mov ecx, arrayOfLetters
and then increment ecx to access each element. Is that right or what I wrote so is correct?
please help, I am confused.
char findMinLetter( char arrayOfLetters[], int arraySize )
{
char min;
__asm{
push eax
push ebx
push ecx
push edx
mov dl, 0x7f // initialize DL
xor ebx, ebx //EBX started off as 0
//moves letters from array to registers
mov ecx, arrayOfLetters[ebx]
mov edx, arrayOfLetters[ebx+1]
The first thing to understand is that 'arrayOfLetters' as passed to your subroutine is a pointer.
To access data (one byte at a time) from pointer (in ecx) in assembler, use:
mov al, [ecx]
mov al, [ecx+1]
... or ...
mov al, [ecx]
inc ecx
mov al, [ecx]
The next issue is how local variables are accessed: there are two main styles used and both of them use stack.
mov ecx, _localvariable_ ; this translates to either
mov ecx, [ebp + offset] ; style (1) or
mov ecx, [esp + offset] ; style (2)
If there was a assembler supporting instruction mov ecx, _localvariable [+1], that would most likely convert to:
mov ecx, [ebp + offset + 1]
And this would not access the char array[], but just some arbitrary byte in the stack.

x86 convert to lower case assembly

This program is to convert a char pointer into lower case. I'm using Visual Studio 2010.
This is from another question, but much simpler to read and more direct to the point.
int b_search (char* token)
{
__asm
{
mov eax, 0 ; zero out the result
mov edi, [token] ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
OR [edi], 20h
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
At my OR instruction, where I'm trying to change the register that contains the address to token into all lower case, I keep getting unhandled exception...access violation, and without the brackets nothing, I don't get errors but nothing gets lowercased. Any advice?
This is part of some bigger code from another question, but I broke it down because I needed this solution only.
Your code can alter only the first char (or [edi], 20h) - the EDI does not increment.
EDIT: found this thread with workaround. Try using the 'dl' instead of al.
; move the token address to search for into EDI
; (not the *token, as would be with mov edi, [token])
mov edi, token
LOWERCASE_TOKEN: ;lowercase the token
mov al, [edi]
; check for null-terminator here !
cmp al, 0
je GET_OUT
or al, 20h
mov dl, al
mov [edi], dl
inc edi
jmp LOWERCASE_TOKEN
GET_OUT:
I would load the data into a register, manipulate it there, then store the result back to memory.
int make_lower(char* token) {
__asm {
mov edi, token
jmp short start_loop
top_loop:
or al, 20h
mov [edi], al
inc edi
start_loop:
mov al, [edi]
test al, al
jnz top_loop
}
}
Note, however, that your conversion to upper-case is somewhat flawed. For example, if the input contains any control characters, it will change them to something else -- but they aren't upper case, and what it converts them to won't be lower case.
The problem is, that the OR operator like many others don't allow two memory or constant parameters. That means: The OR operator can only have following parameters:
OR register, memory
OR register, register
OR register, constant
The second problem is, that the OR has to store the result to a register, not to memory.
Thats why you get an access violation, when the brackets are set. If you remove the brackets, the parameters are ok, but you don't write your lowercase letter to memory, what you intend to do. So use another register, to copy the letter to, and then use OR.
For example:
mov eax, 0 ; zero out the result
mov edi, [token] ; move the token to search for into EDI
MOV ecx, 0
LOWERCASE_TOKEN: ;lowercase the token
MOV ebx, [edi] ;## Copy the value to another register ##
OR ebx, 20h ;## and compare now the register and the memory ##
MOV [edi], ebx ;##Save back the result ##
INC ecx
CMP [edi+ecx],0
JNZ LOWERCASE_TOKEN
MOV ecx, 0
That should work^^

Resources