(The problem is based on assembly language ARM.)
I'm dealing with a problem which asking me to reverse a given array.
Just like this:
Given array: 1, 2, 3, 4, 5
Reversed array: 5, 4, 3, 2, 1
And the limitation of this problem is that I'm only supposed to use registers r0-r3.
I have a basic algorithm, but I'm really confused when I'm trying to implement the idea.
My algorithm:
Loop:
1. get value from head pointer, ptr++
2. get value from tail pointer, ptr--
3. swap them
4. check if head pointer and tail pointer cross,
if so, exit loop and return.
if not, go back to loop.
But I just don't know how to use only 4 registers to solve this problem..
Below would be all I have currently.
.text
.global reverse
reverse:
# See if head and tail ptr cross
# If so, end loop (b end)
head:
# use r2 to represent head value
ldr r2,[r0] # r2 <-*data get the first value
tail:
# mov r1,r1 # size
sub r1,r1,#1 # size-1
lsl r1,r1,#2 # (size-1)*4
add r0,r0,r1 # &data[size-1] need to ldr r1,[r0] to get value
ldr r1,[r0] # get value for r1 (from tail)
swap:
# swap values
mov r3, r1 #store value to r3
str r2, [r0]
# head ptr ++
# tail ptr --
# back to reverse
end:
# loop ends
Crude and inefficient example
.data
Array: .word 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,32,64,128,256,512
.equ len.Array,.-Array
.text
.global main
main:
nop
sub sp, sp, #len.Array // save space on stack
ldr r1,=Array // Array
mov r2, #len.Array // length of array
mov r3, #0 // zero init counter Array
1:
ldr r0, [r1,r3] // load word size element position x from Array
push {r0} // push element value into stack
add r3, r3, #4 // inc Array counter by 4 since word size is 4 bytes
cmp r3, r2 //
blt 1b
// pop values off the stack - LIFO results in reversal
mov r3, #0 // zero init counter Array
2:
pop {r0} // pop element value from stack - LIFO
str r0, [r1,r3]
add r3, r3, #4 // inc Array counter by 4 since word size is 4 bytes cmp r3, r2
blt 2b add sp, sp, #len.Array // restore stack pointer
GDB output:
(gdb) x/21d $r1
0x1102d: 1 2 3 4
0x1103d: 5 6 7 8
0x1104d: 9 10 11 12
0x1105d: 13 14 15 16
0x1106d: 32 64 128 256
0x1107d: 512
(gdb) x/21d $r1
0x1102d: 512 256 128 64
0x1103d: 32 16 15 14
0x1104d: 13 12 11 10
0x1105d: 9 8 7 6
0x1106d: 5 4 3 2
0x1107d: 1
Related
.equ READERROR, 0 #Used to check for scanf read error.
.global main # Have to use main because of C library uses.
main:
prompt:
# Ask the user to enter a number.
ldr r0, =strInputPrompt # Put the address of my string into the first parameter
bl printf # Call the C printf to display input prompt.
ldr r0, =numInputPattern # Setup to read in one number.
ldr r1, =intInput # load r1 with the address of where the
# input value will be stored.
bl scanf # scan the keyboard.
cmp r0, #READERROR # Check for a read error.
beq readerror # If there was a read error go handle it.
ldr r1, =intInput # Have to reload r1 because it gets wiped out.
ldr r1, [r1] # Read the contents of intInput and store in r1 so that
# it can be printed
ldr r0, =strOutputNum
bl printf
ldr r0, =strOutputEven
loop:
cmp r1, #101
beq end
cmp r1, #0
bne odd
cmp r1, #1
bne even
odd:
add r1, r1, LSL #1 /* r1 ← r1 + (r1 << 1) */
bl printf
even:
mov r1, r1, ASR #1 /* r1 ← (r1 >> 1) */
b end_loop
end_loop:
add r2, r2, #1 /* r2 ← r2 + 1 */
b loop /* branch to loop */
# Print the input out as a number.
# r1 contains the value input to keyboard.
b myexit # leave the code.
readerror:
# Got a read error from the scanf routine. Clear out the input buffer then
# branch back for the user to enter a value.
# Since an invalid entry was made we now have to clear out the input buffer by
# reading with this format %[^\n] which will read the buffer until the user
# presses the CR.
ldr r0, =strInputPattern
ldr r1, =strInputError # Put address into r1 for read.
bl scanf # scan the keyboard.
Not going to do anything with the input. This just cleans up the input buffer. The input buffer should now be clear so get another input.
b prompt
myexit:
End of my code. Force the exit and return control to OS
mov r7, #0x01 # SVC call to exit
svc 0 # Make the system call.
.data
Declare the strings and data needed
.balign 4
strInputPrompt: .asciz "Input a number between 1 and 100: \n"
.balign 4
strOutputNum: .asciz "You entered: %d \n"
.balign 4
strOutputEven: .asciz "The even numbers from 1 to %d are: \n"
Format pattern for scanf call.
.balign 4
numInputPattern: .asciz "%d" # integer format for read.
.balign 4
strInputPattern: .asciz "%[^\n]" # Used to clear the input buffer for invalid input.
.balign 4
strInputError: .skip 100*4 # User to clear the input buffer for invalid input.
.balign 4
intInput: .word 0 # Location used to store the user input.
.global printf
.global scanf
cannot get even and odd functions to work
The first problem is here:
cmp r1, #0
bne odd
cmp r1, #1
bne even
This actually compares the entire number to 0 or 1, when you only want the rightmost bit. You'll need to do something like this. I'll only list the even case, I think you can figure out the odd. I'm a bit rusty at ARM assembly and haven't tested this but something like it should work:
tst r1,#1 # sets the flags as you did "AND r1,r1,#1" but doesn't change r1
bne odd # now you will jump to label "odd" if r1 is odd, or keep
# going if it's even.
# if it isn't odd, it must be even, so your even code goes here!
push r1-r2 # technically this is a thumb mode only instruction but if your
# assembler allows unified syntax it should get assembled as the
# 32-bit instruction "STMFD sp!,{r1-r2}"
# this saves r1 and r2 on the stack, we get them back with POP.
# C functions assume the stack is aligned to 8 bytes so we have to push
# an even number of registers even though we only needed to push R1.
ldr r0,=strOutputEven # r1 still contains intInput
bl printf # prints "The even numbers from 1 to %d are,"
pop R1-R2 # unified syntax for "LDMFD sp!,{r1-r2}"
# now we'll do the loop:
ldr r0,=numInputPattern
mov r1,#0 # we don't need the input anymore
mov r2,#0 # the sum goes here
loop_even:
add r1,r1,#2
cmp r1,#101
bcc exitLoop # if R1 is greater than or equal to 101, exit.
PUSH R0-R3 # I can't remember what printf alters so better safe than sorry.
bl printf
POP R0-R3
add r2,r2,r1 # add R1 to R2 and store the result in R2.
b loop_even
exitloop:
mov r1,r2 # put the sum into r1 so we can print it
bl printf # print the string
b my_exit # we're done!
odd:
I would recommend reading up on the PUSH and POP instructions and how they work, it's a much more reliable way of temporarily preserving registers than the method you used with ldr r1,[r1]. Don't feel ashamed if you need to print out a reference list of the ARM instructions - I use one all the time.
.global main
.type main%function
# r4 = argc
# r5 = current offset
# r6 = end offset
# r7 = array argv
main:
mov r4,r0
ldr r0,=message
mov r5,#0 // initialize offset
mov r6,#4
mul r6,r4,r6 // calculate end offset
mov r7,r1 // put array argv in r7
loop:
ldr r1,[r7,r5] // load the argv element with offset r5
push {ip,lr} // save lr
bl printf
pop {ip,lr}
add r5,r5,#4 // go to next word
cmp r5,r6
bne loop // if I haven't reached the end offset, it does another cycle
end:
mov r0,#0 // clear exit code
bx lr // returns
message:
.asciz "%s\n"
Output:
$ ./a.out a
./a.out
Segmentation fault
but:
$ ./a.out
./a.out
so apparently the problems comes when I try to access the next element of the array... really don't know why... I'm new with arm and assembly..
I'm trying to translate a simple insertion sort algorithm to assembly, but something about this particular configuration is causing the program to get an invalid pointer error.
Here's the C version that I'm using:
int n, array[100], c, d, t;
for (c = 1; c < n - 1; c++) {
d = c;
while (d > 0 && array[d] < array[d - 1]) {
t = array[d];
array[d] = array[d - 1];
array[d - 1] = t;
d--;
}
}
This is a C struct that is being used:
typedef struct {
int *list;
int size;
int maxSize;
} list;
Here is my assembly file:
.syntax unified
.text
.align 8
.global insert_ARM
.func insert_ARM, insert_ARM
.type insert_ARM, %function
insert_ARM:
push {r4-r11, ip, lr}
# setup
ldr r4, [r0, #4]
sub r4, r4, 1 # r4 = n-1
mov r5, #1 # c=1
mov r6, #16 # d=0, which starts at #16
mov r7, #0 # t=0
for:
# d = c ; needs these lines to do the assembly equivalent, which is * 4.
mov r6, r5 # d = c
LSL r6, #2 # uses logical shift left: multiplies r6 by 4 to get the correct index
add r6, r6, 16 # add 16 because that's where the array starts
while:
# condition 1: d > 0
cmp r6, #0 # if d <= 0, get out of there
ble forLoopStatements
# condition 2: array[d] < array[d-1]
# first, I need to define array[d] and array[d-1]
# r8 = array[d] and r9 = array[d-1]
sub r10, r6, #4 # r10 = d-1
ldr r9, [r0, r10] # r9 = array[d-1]
ldr r8, [r0, r6] # r8 = array[d]
cmp r9, r8 # comparing array[d-1] with array[d]
bge forLoopStatements # if array[d] >= array[d-1], get out of there
# while effects
# note that r8 should still be array[d] here.
str r9, [r0, r6] # array[d] = array[d-1]
str r8, [r0, r10] # array[d-1] = t # BUG HERE.
sub r6, r6, #4 # d--; // does -4 for ARM
bal while # repeat loop
forLoopStatements:
# (c<n-1; c++)
add r5, r5, #1 # c++
cmp r5, r4 # compares c with n-1
blt for # if c < n-1, loop again
end:
mov r0, r10
pop {r4-r11, ip, lr}
BX lr
.endfunc
.end
It seems to be
str r8, [r0, r10] # array[d-1] = t
that causes a trip at some point.
Edit: I found out that r8's numbers during this instruction are somehow incorrect, since immediately using something like
mov r8, #4
before the store prevents the error (but of course makes the results incorrect).
Upon examining the contents of r0, it happens that the update is going off range because other members of the struct are being modified in the process. Array index 0 is at +16.
You found the problem in the translation to assembly. Note however the following problems:
The outer loop should run all the way to c < n instead of c < n - 1. As coded, the last element of the array is never moved.
it would be more readable to use 2 nested for loops:
int n, array[100], c, d, t;
for (c = 1; c < n; c++) {
for (d = c; d > 0 && array[d] < array[d - 1]; d--) {
t = array[d];
array[d] = array[d - 1];
array[d - 1] = t;
}
}
Every one has a different approach to writting code. Mine is different from your, but I would like to share my ideas. I would start with as simple as possible to get somthing working and build from there. Here is a sample code for a forloop.
/* forloop.s */
/* int n, array[100], c, d, t;
for (c=1; c<n-1; c++)
address of array = r0 = .word ( Raspbian Jessie = 32 bits )
n = r4 = array size
c = r5 = 1word = 4memory_bytes = index into array
d = r6 = c = address in array
array[d] = r10 = data
*/
.data
.balign 4
array:
.word 6, 3, 7, 8, 5, 2, 1, 9, 4
size:
.word (size - array)
.text
.global main
main:
push {r4-r12, lr} # save registers for OS
ldr r0, =array # load address of array in r0
ldr r4, =size # load address of size in r4
ldr r4, [r4] # load size in r4
sub r4, #4 # substract 1 word from r4 (n=n-1)
mov r5, #4 # move 4 in r5 (c=1word=4memory_bytes)
for: # (c=1; c<n-1; c++)
add r6, r0, r5 # d (r6) = array address (r0) + (c=4)
# while: # while loop would go here
ldr r10, [r6], #-4 # r10 = array[d], d=d-4
ldr r11, [r6] # r11 = array[d-1]
#... # while code
cmp r0, r6 # is d > 0 ...
#... #continue while loop code
# back to forloop code
cmp r5, r4 # compare (subtract) r5 (c) from r4 (n)
add r5, #4 # add 1 word to r5 (c++)
blt for # end of for loop (c<n-1)
end:
mov r0, #0 # set exit code
pop {r4-r12, lr} # restore enviroment for return to OS
bx lr # return to OS
Assemble and link the code and the run it and check ending status.
as -o forloop.o forloop.s
gcc -o forloop forloop.o
./forloop; echo $?
It works for me on the Raspberry Pi. I don't know much about gdb, but this may help as suggested by Jester. (See middle section "Commands" at http://cs107e.github.io/guides/gdb/ for more information.)
pi#RPi0:~/pgm/Asm $ gdb -tui forloop # Text User Interface
---Type <return> to continue, or q <return> to quit--- [Enter]
(gdb) layout asm
(gdb) start # start is required
(gdb) layout reg
(gdb) Ctl-x o # Selects registers as Up & Down arrow to see all
(gdb) si # single step
(gdb) [Enter] # repeat single step
(gdb) run # run program to end
(gdb) q # quit gdb
Move the down arrow to see the cpsr register. The left most number is the flags 8=Negative, 6=Zero&Carry, 4=Zero, 2=Carry, 1=oVerflow.
Another approach to debugging assembly program on arm is to use the linux printf command. Here is myprint.s.
/* myprint.s */
.data
.balign 4
format:
.asciz " %2d %2d %2d %2d %2d %2d %2d %2d %2d\n"
.balign 4
array:
.word 6, 3, 7, 8, 5, 2, 1, 9, 4
size:
.word (size - array)
.text
.global main
print: # --- a printf function to print the value in the array ---
push {r0-r12, lr} # save registers for OS
mrs r10, cpsr # save flag settings
ldr r11, =array # To print the array[0-8], the array
ldm r11, {r1-r9} # address is loaded in r11 and stored
push {r4-r10} # in reg r1-r9, printf gets args# from
ldr r0, =format # format, 3 print from r1-r3, rest from
bl printf # stack.
pop {r4-r10} # adjust stack, restore r10 (flags)
msr cpsr_f, r10 # restore saved flags
pop {r0-r12, pc} # restore reg and return
main:
push {r4-r12, lr} # save registers for OS
bl print # --- can be placed anywhere in code ---
ldr r0, =array # load address of array in r0
ldr r4, =size # load address of size in r4
ldr r4, [r4] # load size in r4
sub r4, #4 # substract 1word from r4 (n=n-1)
mov r5, #4 # move 4 in r5 (c=1word=4memory_bytes)
for: # (c=1; c<n-1; c++)
add r6, r0, r5 # d=r6 = array address (r0) + (c=4)
while: # while loop would go here
ldr r10, [r6], #-4 # r10 = array[d], d=d-4
ldr r11, [r6] # r11 = array[d-1]
cmp r10, r11 # is array[d] < array[d-1]
bge forloop_code # if not, continue forloop code
mov r7, r11 # move array[d-1] into t (r7)
str r10, [r6], #4 # store array[d] into array[d-1], (d-1)+4=d
str r7, [r6], #-4 # store t-array[d-1] into array[d], d-4=(d-1)
cmp r6, r0 # is d>0 (addr(array[d-1]) > addr(array[0]))?
bgt while # yes, check if array[d-1] < array[d-2]
forloop_code: # back to forloop code
bl print # --- can be placed anywhere in code ---
cmp r5, r4 # compare (subtract) r5 (c) from r4 (n)
add r5, #4 # add 1 word to r5 (c++)
blt for # end of for loop (c<n-1)
end:
pop {r4-r12, lr} # restore registers for OS
mov r0, #0 # set exit code
bx lr # return to OS
as -o myprint.o myprint.s
gcc -o myprint myprint.o
./myprint; echo $?
6 3 7 8 5 2 1 9 4
3 6 7 8 5 2 1 9 4
3 6 7 8 5 2 1 9 4
3 6 7 8 5 2 1 9 4
3 5 6 7 8 2 1 9 4
2 3 5 6 7 8 1 9 4
1 2 3 5 6 7 8 9 4
1 2 3 5 6 7 8 9 4
1 2 3 4 5 6 7 8 9
0
Another thought would be to assemble your C code and use gdb to see how C code in assembly. This was an interesting projects, I did not know about insertion sort.
I figured it out. Aside from cleaning up my code, I just needed to translate
while ( d > 0 )
as
cmp r6, #16 # if d <= 0, get out of there
ble forLoopStatements
instead of
cmp r6, #0 # if d <= 0, get out of there
ble forLoopStatements
to keep the minimum index at 0.
.global reverse
.data
start: .word 1
end: .word 1
loopcount: .word 0
reverse:
ldr r3, =end
str r1, [r3]
next:
ldr r3,=end
ldr r2,=start
ldr r3, [r3]
ldr r2, [r2]
cmp r2,r3
bgt done
cmp r2,r3
beq done
sub r3,r2
mov r2,#0
mov r1,#0
loop:
cmp r2,r3
beq next2
add r2, r2, #1
add r1, r1, #4
b loop
next2:
ldr r3, [r0]
add r0, r0, r1
ldr r2, [r0]
str r3, [r0]
sub r0, r0, r1
str r2, [r0]
add r0, r0, #4
ldr r3,=end
ldr r1, [r3]
sub r1, r1, #1
str r1, [r3]
ldr r3,=start
ldr r1, [r3]
add r1, r1, #1
str r1, [r3]
b next
done:
bx lr
I am trying to reverse an array and this is my reverse function in assembly, the function is reverse(int data*,size); data is an array and size is the size. It works any array of size 5 but for longer arrays, for example of size 10, it will ignore the last 2 and act as if the array is of size 8 and swap it all as if the last 2 numbers in the array didn't exist, it would return:
Array:
1 2 3 4 5 6 7 8 9 10
My Return:
8 7 6 5 4 3 2 1 9 10
I can't seem to find out what my issue is.
Below is a crude and likely inefficient example of a reverse copy of one array to another; which may not be necessarily what you want but may foster some ideas.
Hardware : Marvell Armada 370/XP
model name : ARMv7 Processor rev 2 (v7l)
.bss
rArray: .zero
.data
Array: .word 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,32, 64, 128, 256, 512
.equ len.Array,.-Array
.text
.global main
main:
ldr r1,=Array // Array
mov r2, #len.Array // length of array
mov r3, #0 // zero init counter Array
ldr r4,=rArray // rArray
sub r5,r2, #4 // rArray counter - 1 element
1:
ldr r10, [r1,r3] // load word size element position x from Array
str r10, [r4,r5] // store word size element position x from Array into word size position y in rArray
add r3, r3, #4 // inc Array counter by 4 since word size is 4 bytes
subs r5, r5, #4 // decement rArray counter by 4 & get status (s)
bpl 1b // branch back to loop if positive or zero; i.e., N condition flag is clear
Results using GDB:
(gdb) x/21d $r1
0x1102d: 1 2 3 4
0x1103d: 5 6 7 8
0x1104d: 9 10 11 12
0x1105d: 13 14 15 16
0x1106d: 32 64 128 256
0x1107d: 512
(gdb) x/21d $r4
0x11082: 512 256 128 64
0x11092: 32 16 15 14
0x110a2: 13 12 11 10
0x110b2: 9 8 7 6
0x110c2: 5 4 3 2
0x110d2: 1
I'm new to assembly programing and I'm programing for ARM.
I'm making a program with two subroutines: one that appends a byte info on a byte vector in memory, and one that prints this vector. The first address of the vector contains the number of elements that follows, up to 255. As I debug it with GDB, I can see that the "appendbyte" subroutine works fine. But when it comes to the "printvector" one, there are some problems. First, the element loaded in register r1 is wrong (it loads 0, when it should be 7). Then, when I read the registers values with GDB after I use the "printf" function, a lot of register get other values that weren't supposed to change, since I didn't modify them, I just used "printf". Why is "printf" modyfing the values.
I was thinking something about the align. I'm not sure if i'm using the directive correctly.
Here is the full code:
.text
.global main
.equ num, 255 # Max number of elements
main:
push {lr}
mov r8, #7
bl appendbyte
mov r8, #5
bl appendbyte
mov r8, #8
bl appendbyte
bl imprime
pop {pc}
.text
.align
printvector:
push {lr}
ldr r3, =vet # stores the address of the start of the vector in r3
ldr r2, [r3], #1 # stores the number of elements in r2
.align
loop:
cmp r2, #0 #if there isn't elements to print
beq fimimprime #quit subroutine
ldr r0, =node #r0 receives the print format
ldr r1, [r3], #1 #stores in r1 the value of the element pointed by r3. Increments r3 after that.
sub r2, r2, #1 #decrements r2 (number of elements left to print)
bl printf #call printf
b loop #continue on the loop
.align
endprint:
pop {pc}
.align
appendbyte:
push {lr}
ldr r0, =vet #stores in r0 the beggining address of the vector
ldr r1, [r0], #1 #stores in r1 the number of elements and makes r0 point to the next address
add r3, r0, r1 #stores in r3 the address of the first available position
str r8, [r3] #put the value at the first available position
ldr r0, =vet #stores in r0 the beggining address of the vector
add r1, r1, #1 # increment the number of elements in the vector
str r1, [r0] # stores it in the vector
pop {pc}
.data # Read/write data follows
.align # Make sure data is aligned on 32-bit boundaries
vet: .byte 0
.skip num # Reserve num bytes
.align
node: .asciz "[%d]\n"
.end
The problems are in
ldr r1, [r3], #1
and
bl printf
I hope I was clear on the problem.
Thanks in advance!
The ARM ABI specifies that registers r0-r3 and r12 are to be considered volatile on function calls. Meaning that the callee does not have to restore their value. LR also changes if you use bl, because LR will then contain the return address for the called function.
More information can be found on ARMs Information Center entry for the ABI or in the APCS (ARM Procedure Call Standard) document.
printvector:
push {lr}
ldr r3, =vet # stores the address of the start of the vector in r3
ldr r2, [r3], #1 # stores the number of elements in r2
.align
loop:
cmp r2, #0 #if there isn't elements to print
beq fimimprime #quit subroutine
ldr r0, =node #r0 receives the print format
ldr r1, [r3], #1 #stores in r1 the value of the element pointed by r3. Increments r3 after that.
sub r2, r2, #1 #decrements r2 (number of elements left to print)
bl printf #call printf
b loop #continue on the loop
.align
endprint:
pop {pc}
that is definitely not how you use align. Align is there to...align the thing that follows on some boundary (specified in an optional argument, note this is an assembler directive, not an instruction) by padding the binary with zeros or whatever the padding is. So you dont want a .align in the code flow, between instructions. You have done that between the ldr r1, and the cmp r2 after loop. Now the align after b loop is not harmful as the branch is unconditional but at the same time not necessary as there is no reason to align there the assembler is generating an instruction flow so the bytes cant be unaligned. Where you would use .align is after some data declaration before instructions:
.byte 1,2,3,4,5,
.align
some_code_branch_dest:
In particular one where the assembler complains or the code crashes.