Issue properly comparing strings in intel assembly

Issue properly comparing strings in intel assembly - arrays

My goal is to compare two strings. If they are equal continue the program. If not equal, branch somewhere else. If I'm not at the end of the string and so far everything is equal, loop over to next character.
I am using visual studio, inline intel asm.
The program is functioning, although, on its first run-through, it takes the NOTSAME branch and proceeds to end the program, even though the strings being compared are the same. It does not loop at all because of this, which means something is going wrong in the comparison.
I have tried debugging it and seemingly the registers hold the correct address. Incrementing/decrementing it seems to work, from my understanding, incrementing it just moves the pointer one byte forward, pointing towards the next character. I don't really have a clear idea of which part is going wrong, I just know it isn't getting compared in how I planned.
int main()
{
string arr[] = {"a","b","c","d","e" };
string a= "a";
__asm
{
lea esi, arr // load address into esi
lea edi, a
dec edi
LOOPING:
inc edi// ds:di->next character in string2
lodsb// load al with next char from string, lodsb increments si automatically.
cmp[edi], al//; compare characters
jne NOTSAME//; jump out of loop if they are not the same
cmp al, 0//; they are the same, but end of string ?
jne LOOPING//; no - so go round loop again
After some more debugging, it seems AL is not taking the correct byte from the string. Not sure why yet.

So for some reason, LEA stored the address 4 bytes before where the string starts. I am unsure why this is, but to fix it, I added 4 bytes to each register(pointers) and they both adjusted and now compare properly.
Changed the following
lea esi, a// load address into esi
lea edi,arr
add edi,4
add esi,4

Related

How do you index a char array with a register or QWORD in x64 MASM?

I have read similar posts regarding this question, but just can't find anything that's helping me fix my problem. Essentially, I want to change an entry in a character array at a specific point, so simply indexing it. I know how to do that with a magic number (ie just putting in buffer[2] or something like that) but I am getting errors when I try to index the character array with a number value stored in a register or a QWORD.
So I have a char array called buffer
buffer BYTE 4097 DUP (0)
and I also have QWORD named parseNum
parseNum QWORD 0
Here is the section of code giving me problems:
testJmp:
mov rbx, parseNum
xor buffer[rbx], 2
add parseNum, 2
cmp parseNum, 4097
jne testJmp
This code is supposed to loop through the char array and apply a XOR operand to each entry. However I can only do that if I can index at a value that changes with each loop. When I try to just use xor buffer[parseNum], 2 I get an error that reads error A2101: cannot add two relocatable labels at that line of code. When I try moving parseNum into rbx and then putting in rbx as the index, I also get an error that reads, error LNK2017: 'ADDR32' relocation to 'buffer' invalid without /LARGEADDRESSAWARE:NO. I'm at a total lose and am completely stuck. Hopefully I formatted this question properly as I know many on stackoverflow take that very seriously. This is one of my very first posts so I hope I did that properly and up to standards.
To summarize, is there a way to index a char array with an array or variable? Thank you for any help! Also, I am a student, and MASM and Assembly is something I am still very much learning and not skilled at yet, so if there's something obvious I missed, that's why. Again, thank you for your help!
EDIT:
I guess I need both buffer's address and parseNum to loaded into registers. I tried making these changes, as seen here:
lea rbx, buffer
mov rax, parseNum
xor buffer[rax], 2
but I still get the exact same errors as before.
EDIT:
Thanks to some very helpful comments, I got it working, here is the code:
testJmp:
lea rdx, buffer
mov rbx, parseNum
mov al, [rdx + rbx]
xor al, 2
mov [rdx + rbx], al
add parseNum, 1
cmp parseNum, 4097
jne testJmp
essentially, the buffer and parseNum must be loaded into registers. Then I access the index by added the two registers and moving that to al. Then I can apply xor and mov it back into the index.

Copying to and Displaying an Array

Hello Everyone!
I'm a newbie at NASM and I just started out recently. I currently have a program that reserves an array and is supposed to copy and display the contents of a string from the command line arguments into that array.
Now, I am not sure if I am copying the string correctly as every time I try to display this, I keep getting a segmentation error!
This is my code for copying the array:
example:
%include "asm_io.inc"
section .bss
X: resb 50 ;;This is our array
~some code~
mov eax, dword [ebp+12] ; eax holds address of 1st arg
add eax, 4 ; eax holds address of 2nd arg
mov ebx, dword [eax] ; ebx holds 2nd arg, which is pointer to string
mov ecx, dword 0
;Where our 2nd argument is a string eg "abcdefg" i.e ebx = "abcdefg"
copyarray:
mov al, [ebx] ;; get a character
inc ebx
mov [X + ecx], al
inc ecx
cmp al, 0
jz done
jmp copyarray
My question is whether this is the correct method of copying the array and how can I display the contents of the array after?
Thank you!

The loop looks ok, but clunky. If your program is crashing, use a debugger. See the x86 for links and a quick intro to gdb for asm.
I think you're getting argv[1] loaded correctly. (Note that this is the first command-line arg, though. argv[0] is the command name.) https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames says ebp+12 is the usual spot for the 2nd arg to a 32bit functions that bother to set up stack frames.
Michael Petch commented on Simon's deleted answer that the asm_io library has print_int, print_string, print_char, and print_nl routines, among a few others. So presumably you a pointer to your buffer to one of those functions and call it a day. Or you could call sys_write(2) directly with an int 0x80 instruction, since you don't need to do any string formatting and you already have the length.
Instead of incrementing separately for two arrays, you could use the same index for both, with an indexed addressing mode for the load.
;; main (int argc ([esp+4]), char **argv ([esp+8]))
... some code you didn't show that I guess pushes some more stuff on the stack
mov eax, dword [ebp+12] ; load argv
;; eax + 4 is &argv[1], the address of the 1st cmdline arg (not counting the command name)
mov esi, dword [eax + 4] ; esi holds 2nd arg, which is pointer to string
xor ecx, ecx
copystring:
mov al, [esi + ecx]
mov [X + ecx], al
inc ecx
test al, al
jnz copystring
I changed the comments to say "cmdline arg", to distinguish between those and "function arguments".
When it doesn't cost any extra instructions, use esi for source pointers, edi for dest pointers, for readability.
Check the ABI for which registers you can use without saving/restoring (eax, ecx, and edx at least. That might be all for 32bit x86.). Other registers have to be saved/restored if you want to use them. At least, if you're making functions that follow the usual ABI. In asm you can do what you like, as long as you don't tell a C compiler to call non-standard functions.
Also note the improvement in the end of the loop. A single jnz to loop is more efficient than jz break / jmp.
This should run at one cycle per byte on Intel, because test/jnz macro-fuse into one uop. The load is one uop, and the store micro-fuses into one uop. inc is also one uop. Intel CPUs since Core2 are 4-wide: 4 uops issued per clock.
Your original loop runs at half that speed. Since it's 6 uops, it takes 2 clock cycles to issue an iteration.
Another hacky way to do this would be to get the offset between X and ebx into another register, so one of the effective addresses could use a one-register addressing mode, even if the dest wasn't a static array.
mov [X + ebx + ecx], al. (Where ecx = X - start_of_src_buf). But ideally you'd make the store the one that used a one-register addressing mode, unless the load was a memory operand to an ALU instruction that could micro-fuse it. Where the dest is a static buffer, this address-different hack isn't useful at all.
You can't use rep string instructions (like rep movsb) to implement strcpy for implicit-length strings (C null-terminated, rather than with a separately-stored length). Well you could, but only scanning the source twice: once for find the length, again to memcpy.
To go faster than one byte clock, you'd have to use vector instructions to test for the null byte at any of 16 positions in parallel. Google up an optimized strcpy implementation for example. Probably using pcmpeqb against a vector register of all-zeros.

Assembly (MASMx86 PC) duplicating array

I'm working on an encryption project that's got me stumped. I need to apply a series of encryption keys to a string array in order to determine which key was used to encrypt the message. I have the code set up to run a loop which applies all keys within the range, but now i need a way to 'refresh' the original string array. My solution was to copy the original array into a new un-populated array of the same size, for each iteration of the loop. I wrote the following procedure to accomplish this:
CopyBuffer PROC
; Copies the original buffer into a storage variable
; recieves: nothing
; returns: nothing
pushad
mov ecx,67 ; loop counter
mov esi,0
L3:
mov dl,buffer[esi]
mov bufferCopy[esi],dl ; Store byte in array copy
inc esi ; point to the next byte
loop L3
popad
ret
CopyArray ENDP
The arrays referred to in the above code were declared as follows:
buffer BYTE 0e9h,0c8h,0d0h,087h,0ceh,0d4h,087h,0d3h,0cfh,0c2h,087h,0d3h,0ceh,0cah,0c2h,087h
BYTE 0c1h,0c8h,0d5h,087h,0c6h,0cbh,0cbh,087h,0c0h,0c8h,0c8h,0c3h,087h,0cah,0c2h,0c9h
BYTE 087h,0d3h,0c8h,087h,0c4h,0c8h,0cah,0c2h,087h,0d3h,0c8h,087h,0d3h,0cfh,0c2h,087h
BYTE 0c6h,0ceh,0c3h,087h,0c8h,0c1h,087h,0d3h,0cfh,0c2h,0ceh,0d5h,087h,0d7h,0c6h,0d5h
BYTE 0d3h,0deh
bufferCopy db 67 dup(0)
My code successfully populates the duplicate array. However the elements of the copy are different from the corresponding elements of the original array.
I'd really appreciate the wisdom of a more advanced assembly programmer on this one! I'm fairly new to the language, and still a bit fuzzy on syntax.

The code seems to be correct (regardless of its untidy style). The only observation is that the provided sample array has only 66 elements, not 67, so the first byte will be copied twice.
The problem is somewhere else. Try to run the program in the debugger and check the array immediately after the return of the procedure in order to proof its correctness.

Well, it seems
You are not using the loop counter that you have created and your loop is just running forever which in turn overwrites your copyBuffer array.
Try the lines
dec ecx
jnz L3
instedof loop L3 that should help.

Ok, first of all thanks to everyone for your suggestions! I've solved the problem (though probably not in the simplest way possible). First of all, i changed the declaration of bufferCopy from
bufferCopy db 67 dup(0)
to
bufferCopy BYTE SIZEOF buffer DUP(0),0
I think converting to a null terminated string solved some of my problems. Beyond that, i also ammended my CopyBuffer PROC to run as follows:
pushad
mov edx,OFFSET buffer
Call StrLength
mov bufSize,eax
mov ecx,bufSize
mov esi,0
L2:
mov al,buffer[esi]
mov bufferCopy[esi],al
inc esi
loop L2
popad
ret
The call to StrLength was probably a redundancy, considering i knew the length of the string to begin with. As a disclaimer, i'd like to add that i'm not exceptionally familiar with assembly yet. This solution (while functional) is by no means optimal. Furthermore, my explanations are cursory at best.

assembly - accessing array elements

I have an array that I am attempting to print.
I would like to print it out so I can see if it is correct.
It is currently printing the number 1 and stopping. Or, if I mess with the ECX differently it prints out a bunch of zeros and crashes.
Here is my program.
.data
array DWORD 10 DUP(5, 7, 6, 1, 4, 3, 9, 2, 10, 8)
my_size dd 10
sorted DWORD 0
first DWORD 0
second DWORD 0
.code
start:
main proc
cls
mov EBX, offset[array]
mov ECX, [my_size]
dec ECX
sub ESI, ESI
sub EDI, EDI
; print
mov EBX, offset aa
sub ECX, ECX
;mov ECX, my_size
mov ECX, 10
my_loop:
mov EAX, [EBX]
inc EBX
dec ECX
cmp ECX, 0
jle exit_loop
mov first, EAX
print chr$("printing array elements: ")
print str$(first)
loop my_loop
exit_loop:
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start

I hate to say it, but you're not "ready" to write a bubble sort. Either it's a completely insane homework assignment, or you haven't followed along with the class so far (possibly both).
Very first thing, I don't think you've defined your array correctly. As I read your code, you've got 100 dwords there - 10 copies of the 10 numbers you specify. You shouldn't need "DUP" in there.
I would print the unsorted array first, just to make sure you've got that part right. You appear to be using a couple of macros, there - they sure as heck aren't instructions. Just from the names, I would guess(!) that "print_chr$" prints a single character and "print_str$" prints a string (although you seem to be printing your string and the number 1). If you've got a "print_int$" in your macro set, I would guess(!) that's what you want. Since I'm not familiar with your macros, I could be wrong.
Although you've defined the array as "dword", you only compare a single byte in your sort routine. This probably works for the small numbers you're using, but it isn't really right.
The usual way to do a bubble sort is to set a "flag" (register or variable - this may be what "sorted" is for) to zero at the beginning of each run through the array, and set it to 1 every time you do a swap. When you've done a pass through your array and the flag is still zero - you haven't done a swap - then, and only then, your array is sorted. If you print the array after each pass, you'll see why it's called a "bubble" sort - the smallest/largest number "bubbles up" to its final position.
Your code to walk through a dword array (esi * 4) looks about right (outside of only comparing a byte), but your print routine only increments ebx by one each time through the loop. Either "add ebx, 4" or use "ebx * 4" (not both) to print dwords. Or perhaps your array is only supposed to be bytes?
Seriously, I'd start with something simpler - just print the array - and work up to adding the sort routine after you've got that working.
Hope it helps.
Best,
Frank

I see you've simplified the code. Good idea! I'm still not familiar with the macro you're using, "print_str$". That doesn't "look" to me like it prints a number. Have you got a "print_int$" or similar? If you can get it to print just the first number "5", that would be a good start.
Now... working through your loop, you just "inc ebx". That won't get you the next dword, it'll get you bytes 2, 3, and 4 from the first dword and the first byte from the second dword. Since you used "* 4" in the (removed) sort code, you probably want "[ebx * 4]" here. Either that, or add 4 to ebx each time through the loop. One or the other (but not both) should step through an array of dwords.
I suspect that the first step would be to select the "right" macro to print a number. It'll probably get easier from there(?). Courage! :)
Best,
Frank

Accessing array members in assembler

I'm using C and ASM{} in one of our classes in order to complete an assembler project where we have to encrypt an input string, transmit and decrypt it.
The key is loaded into an empty C char array (20 chars long) and is then used later on with the XOR statement to encrypt.
The code to load the address of the array is:
lea esi, key
which puts the address of 'key' into esi. The address here is the same as the address of the key array when we examine the register in debug mode, so we know that works.
mov edx, [esi]
we thought this would move the value of esi's first index into edx, as we use "mov [esi], eax" to put the value of the a register into the esi array. The reverse seemed logical.
However, the value which edx actually gets is "875575655" (last run) or similar. Far too large for a char value.
I get the feeling we may be accessing the array wrong,
Any advice would be appreciated.
Note: Usually to increase the array index we're simply using inc esi and then read like we did above, this is how we were planning on reading from the array as well.

With mov edx, [esi] you read DWORD (because edx size is double word). Use mov dl, [esi] to read byte.

875575655 in hexa is 0x34303967, that would be string 'g904'.
To explain: while logically it's a character string, you can load more characters at once. So, instead of making 20 byte loads and xors, you can make 5 such operations on DWORD.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight