Invalid pointer issue in insertion sort in ARM assembly - c

I'm trying to translate a simple insertion sort algorithm to assembly, but something about this particular configuration is causing the program to get an invalid pointer error.
Here's the C version that I'm using:
int n, array[100], c, d, t;
for (c = 1; c < n - 1; c++) {
d = c;
while (d > 0 && array[d] < array[d - 1]) {
t = array[d];
array[d] = array[d - 1];
array[d - 1] = t;
d--;
}
}
This is a C struct that is being used:
typedef struct {
int *list;
int size;
int maxSize;
} list;
Here is my assembly file:
.syntax unified
.text
.align 8
.global insert_ARM
.func insert_ARM, insert_ARM
.type insert_ARM, %function
insert_ARM:
push {r4-r11, ip, lr}
# setup
ldr r4, [r0, #4]
sub r4, r4, 1 # r4 = n-1
mov r5, #1 # c=1
mov r6, #16 # d=0, which starts at #16
mov r7, #0 # t=0
for:
# d = c ; needs these lines to do the assembly equivalent, which is * 4.
mov r6, r5 # d = c
LSL r6, #2 # uses logical shift left: multiplies r6 by 4 to get the correct index
add r6, r6, 16 # add 16 because that's where the array starts
while:
# condition 1: d > 0
cmp r6, #0 # if d <= 0, get out of there
ble forLoopStatements
# condition 2: array[d] < array[d-1]
# first, I need to define array[d] and array[d-1]
# r8 = array[d] and r9 = array[d-1]
sub r10, r6, #4 # r10 = d-1
ldr r9, [r0, r10] # r9 = array[d-1]
ldr r8, [r0, r6] # r8 = array[d]
cmp r9, r8 # comparing array[d-1] with array[d]
bge forLoopStatements # if array[d] >= array[d-1], get out of there
# while effects
# note that r8 should still be array[d] here.
str r9, [r0, r6] # array[d] = array[d-1]
str r8, [r0, r10] # array[d-1] = t # BUG HERE.
sub r6, r6, #4 # d--; // does -4 for ARM
bal while # repeat loop
forLoopStatements:
# (c<n-1; c++)
add r5, r5, #1 # c++
cmp r5, r4 # compares c with n-1
blt for # if c < n-1, loop again
end:
mov r0, r10
pop {r4-r11, ip, lr}
BX lr
.endfunc
.end
It seems to be
str r8, [r0, r10] # array[d-1] = t
that causes a trip at some point.
Edit: I found out that r8's numbers during this instruction are somehow incorrect, since immediately using something like
mov r8, #4
before the store prevents the error (but of course makes the results incorrect).
Upon examining the contents of r0, it happens that the update is going off range because other members of the struct are being modified in the process. Array index 0 is at +16.

You found the problem in the translation to assembly. Note however the following problems:
The outer loop should run all the way to c < n instead of c < n - 1. As coded, the last element of the array is never moved.
it would be more readable to use 2 nested for loops:
int n, array[100], c, d, t;
for (c = 1; c < n; c++) {
for (d = c; d > 0 && array[d] < array[d - 1]; d--) {
t = array[d];
array[d] = array[d - 1];
array[d - 1] = t;
}
}

Every one has a different approach to writting code. Mine is different from your, but I would like to share my ideas. I would start with as simple as possible to get somthing working and build from there. Here is a sample code for a forloop.
/* forloop.s */
/* int n, array[100], c, d, t;
for (c=1; c<n-1; c++)
address of array = r0 = .word ( Raspbian Jessie = 32 bits )
n = r4 = array size
c = r5 = 1word = 4memory_bytes = index into array
d = r6 = c = address in array
array[d] = r10 = data
*/
.data
.balign 4
array:
.word 6, 3, 7, 8, 5, 2, 1, 9, 4
size:
.word (size - array)
.text
.global main
main:
push {r4-r12, lr} # save registers for OS
ldr r0, =array # load address of array in r0
ldr r4, =size # load address of size in r4
ldr r4, [r4] # load size in r4
sub r4, #4 # substract 1 word from r4 (n=n-1)
mov r5, #4 # move 4 in r5 (c=1word=4memory_bytes)
for: # (c=1; c<n-1; c++)
add r6, r0, r5 # d (r6) = array address (r0) + (c=4)
# while: # while loop would go here
ldr r10, [r6], #-4 # r10 = array[d], d=d-4
ldr r11, [r6] # r11 = array[d-1]
#... # while code
cmp r0, r6 # is d > 0 ...
#... #continue while loop code
# back to forloop code
cmp r5, r4 # compare (subtract) r5 (c) from r4 (n)
add r5, #4 # add 1 word to r5 (c++)
blt for # end of for loop (c<n-1)
end:
mov r0, #0 # set exit code
pop {r4-r12, lr} # restore enviroment for return to OS
bx lr # return to OS
Assemble and link the code and the run it and check ending status.
as -o forloop.o forloop.s
gcc -o forloop forloop.o
./forloop; echo $?
It works for me on the Raspberry Pi. I don't know much about gdb, but this may help as suggested by Jester. (See middle section "Commands" at http://cs107e.github.io/guides/gdb/ for more information.)
pi#RPi0:~/pgm/Asm $ gdb -tui forloop # Text User Interface
---Type <return> to continue, or q <return> to quit--- [Enter]
(gdb) layout asm
(gdb) start # start is required
(gdb) layout reg
(gdb) Ctl-x o # Selects registers as Up & Down arrow to see all
(gdb) si # single step
(gdb) [Enter] # repeat single step
(gdb) run # run program to end
(gdb) q # quit gdb
Move the down arrow to see the cpsr register. The left most number is the flags 8=Negative, 6=Zero&Carry, 4=Zero, 2=Carry, 1=oVerflow.
Another approach to debugging assembly program on arm is to use the linux printf command. Here is myprint.s.
/* myprint.s */
.data
.balign 4
format:
.asciz " %2d %2d %2d %2d %2d %2d %2d %2d %2d\n"
.balign 4
array:
.word 6, 3, 7, 8, 5, 2, 1, 9, 4
size:
.word (size - array)
.text
.global main
print: # --- a printf function to print the value in the array ---
push {r0-r12, lr} # save registers for OS
mrs r10, cpsr # save flag settings
ldr r11, =array # To print the array[0-8], the array
ldm r11, {r1-r9} # address is loaded in r11 and stored
push {r4-r10} # in reg r1-r9, printf gets args# from
ldr r0, =format # format, 3 print from r1-r3, rest from
bl printf # stack.
pop {r4-r10} # adjust stack, restore r10 (flags)
msr cpsr_f, r10 # restore saved flags
pop {r0-r12, pc} # restore reg and return
main:
push {r4-r12, lr} # save registers for OS
bl print # --- can be placed anywhere in code ---
ldr r0, =array # load address of array in r0
ldr r4, =size # load address of size in r4
ldr r4, [r4] # load size in r4
sub r4, #4 # substract 1word from r4 (n=n-1)
mov r5, #4 # move 4 in r5 (c=1word=4memory_bytes)
for: # (c=1; c<n-1; c++)
add r6, r0, r5 # d=r6 = array address (r0) + (c=4)
while: # while loop would go here
ldr r10, [r6], #-4 # r10 = array[d], d=d-4
ldr r11, [r6] # r11 = array[d-1]
cmp r10, r11 # is array[d] < array[d-1]
bge forloop_code # if not, continue forloop code
mov r7, r11 # move array[d-1] into t (r7)
str r10, [r6], #4 # store array[d] into array[d-1], (d-1)+4=d
str r7, [r6], #-4 # store t-array[d-1] into array[d], d-4=(d-1)
cmp r6, r0 # is d>0 (addr(array[d-1]) > addr(array[0]))?
bgt while # yes, check if array[d-1] < array[d-2]
forloop_code: # back to forloop code
bl print # --- can be placed anywhere in code ---
cmp r5, r4 # compare (subtract) r5 (c) from r4 (n)
add r5, #4 # add 1 word to r5 (c++)
blt for # end of for loop (c<n-1)
end:
pop {r4-r12, lr} # restore registers for OS
mov r0, #0 # set exit code
bx lr # return to OS
as -o myprint.o myprint.s
gcc -o myprint myprint.o
./myprint; echo $?
6 3 7 8 5 2 1 9 4
3 6 7 8 5 2 1 9 4
3 6 7 8 5 2 1 9 4
3 6 7 8 5 2 1 9 4
3 5 6 7 8 2 1 9 4
2 3 5 6 7 8 1 9 4
1 2 3 5 6 7 8 9 4
1 2 3 5 6 7 8 9 4
1 2 3 4 5 6 7 8 9
0
Another thought would be to assemble your C code and use gdb to see how C code in assembly. This was an interesting projects, I did not know about insertion sort.

I figured it out. Aside from cleaning up my code, I just needed to translate
while ( d > 0 )
as
cmp r6, #16 # if d <= 0, get out of there
ble forLoopStatements
instead of
cmp r6, #0 # if d <= 0, get out of there
ble forLoopStatements
to keep the minimum index at 0.

Related

C nested loop to ARM assembly

I'm currently having issues translating a C program to ARM assembly. The C program is as follows:
int i = 1;
int j = 0;
int x = 0;
int main(){
for( ; i < 10; i += 2){
for( j = i; j < 10; j++){
x += i + j;
}
}
return x;
}
This code will output 240.
What I have currently is as follows:
.data
i: .word 1
j: .word 0
x: .word 0
.text
.global main
main:
LDR r6, addrJ
LDR r5, addrI
LDR r4, addrX
LDR r3, [r6]
LDR r2, [r5]
LDR r1, [r4]
b loop_outer
loop_outer:
CMP r2, #10
BGE done
MOV r3, r2 # j = i
loop_inner:
CMP r3, #10 # j < 10
BGE inner_done
ADD r1, r1, r2 # x+=i
ADD r1, r1, r3 # x+=j
ADD r3, r3, #1 # j++
inner_done:
ADD r2, r2, #2 # i+=2
b loop_outer
b done
done:
MOV r0, r1
bx lr
addrI: .word i
addrX: .word x
addrJ: .word j
This code currently outputs 50. I have tried debugging myself but I have been having a hard time with GDB.
You're missing the b loop_inner to repeat the inner loop.
And b done is not needed, since it's after the unconditional b loop_outer, so it will never be executed.
loop_outer:
CMP r2, #10
BGE done
MOV r3, r2 # j = i
loop_inner:
CMP r3, #10 # j < 10
BGE inner_done
ADD r1, r1, r2 # x+=i
ADD r1, r1, r3 # x+=j
ADD r3, r3, #1 # j++
b loop_inner
inner_done:
ADD r2, r2, #2 # i+=2
b loop_outer
done:
MOV r0, r1
bx lr

Execution does not go out of the loop in ARM

I want to print in ARM assembly language a given number in decimal in hexadecimal. I'm doing the function that does the conversion and the printing. So far the conversion works but the printing not at all.
It does only print a char at a time and it's not at all what I want, I want a special format of output such that I have 0x and 8 digits.
I wrote a function printf using the given function I had, called _writec that is working but only printing a char at a time. So I wrote a loop until I get the end of string function but here it seems that it doesn't care.
I've followed the execution step-by-step using gdb and it suddenly crash for no appearing reason. When r0 contain 0 it should go to .end according to my beq but it does not.
ARM Code:
.global _print_hex
_print_hex:
push {lr}
#According to .c algorithm : r0 = dec; r1 = quotient;
# r2 = temp; r3 = i ; r4 = j
mov fp, sp
sub sp, sp, #100 # 100 times size of char
mov r1, r0
mov r3, #0
_while:
cmp r1, #0
bne _computing
ldr r0, =.hex_0x
bl _printf
mov r4, #8
_for:
cmp r4, #0
bge _printing
ldr r0, =.endline
bl _printf
mov sp, fp
pop {pc}
_computing:
and r2, r1, #0xF
cmp r2, #10
blt .temp_less_10
add r2, #7
.temp_less_10:
add r2, #48
strb r2, [sp, r3]
add r3, #1
lsr r1, #4
b _while
_printing:
ldrb r0, [sp,r4]
bl _writec
sub r4, #1
b _for
_printf:
push {r0, r1, r2, r3, lr}
mov r1, r0
mov r2, #0
.loop:
ldrb r0, [r1,r2]
cmp r0, #0
beq .end
bl _writec
add r2, #1
b .loop
.end:
pop {r0, r1, r2, r3, lr}
bx lr
.hex_0x:
.asciz "0x"
.align 4
.endline:
.asciz "\n"
.align 4
C code (that I tried to translate):
void dec_to_hex(int dec){
int quotient, i, temp;
char hex[100];
quotient = dec;
i = 0;
while (quotient != 0){
temp = quotient % 16;
if (temp < 10){
temp += 48; // it goes in the ascii table between 48 and 57 that correspond to [0..9]
} else {
temp += 55; //it goes in the first cap letters from 65 to 70 [A..F]
}
hex[i]=(char)temp;
i++;
quotient /= 16;
}
printf("0x");
for(int j=i; j>=0; j--){
printf("%c", hex[j]);
}
printf("\n");
}
Here is the code of _writec :
/*
* Sends a character to the terminal through UART0
* The character is given in r0.
* IF the TX FIFO is full, this function awaits
* until there is room to send the given character.
*/
.align 2
.global _writec
.type _writec,%function
.func _writec,_writec
_writec:
push {r0,r1,r2,r3,lr}
mov r1, r0
mov r3, #1
lsl r3, #5 // TXFF = (1<<5)
ldr r0,[pc]
b .TXWAIT
.word UART0
.TXWAIT:
ldr r2, [r0,#0x18] // flags at offset 0x18
and r2, r2, r3 // TX FIFO Full set, so wait
cmp r2,#0
bne .TXWAIT
strb r1, [r0,#0x00] // TX at offset 0x00
pop {r0,r1,r2,r3,pc}
.size _writec, .-_writec
.endfunc
So in ARM when debugging it crashed at my first call of _printf and when I comment all the call to _printf it does print the result but not as the desired format. I only got the hex value.

Why is my min and max value for the array incorrect? ARM Assembly

I am writing a program that creates an array of 10 random integers between 0 and 999. My code for creating and occupying the array is functioning properly but when I attempt to print out the minimum and maximum value of the array, it prints the wrong value.
This is the code where I read through my array and print out each value. The spaced section is where I look for the min and max:
main:
BL _seedrand # seed random number generator with current time
MOV R0, #0 # initialze index variable
MOV R3, #1000
MOV R5, #-1
PUSH {R3}
PUSH {R5}
readloop:
CMP R0, #10 # check to see if we are done iterating
BEQ readdone # exit loop if done
LDR R1, =a # get address of a
LSL R2, R0, #2 # multiply index*4 to get array offset
ADD R2, R1, R2 # R2 now has the element address
LDR R1, [R2] # read the array at address
POP {R5}
CMP R5, R1
MOVGT R5, R1
POP {R3}
CMP R3, R1
MOVLT R3, R1
PUSH {R3}
PUSH {R5}
PUSH {R0} # backup register before printf
PUSH {R1} # backup register before printf
PUSH {R2} # backup register before printf
MOV R2, R1 # move array value to R2 for printf
MOV R1, R0 # move array index to R1 for printf
BL _printf # branch to print procedure with return
POP {R2} # restore register
POP {R1} # restore register
POP {R0} # restore register
ADD R0, R0, #1 # increment index
B readloop # branch to next loop iteration
readdone:
MOV R1, R3
LDR R0, =min_str # R0 contains formatted string address
BL printf
MOV R1, R5
LDR R0, =max_str
BL printf
B _exit
If you would like to see the rest of the code where the array is created and populated, I can do so. I have tried different tags and backing up R3 and R5 on the stack and it still prints the wrong values. Specifically the min value will always print the last integer in the array and the max is always printed as 0.

ARM Assembly Arrays

I am trying to figure out how arrays work in ARM assembly, but I am just overwhelmed. I want to initialize an array of size 20 to 0, 1, 2 and so on.
A[0] = 0
A[1] = 1
I can't even figure out how to print what I have to see if I did it correctly. This is what I have so far:
.data
.balign 4 # Memory location divisible by 4
string: .asciz "a[%d] = %d\n"
a: .skip 80 # allocates 20
.text
.global main
.extern printf
main:
push {ip, lr} # return address + dummy register
ldr r1, =a # set r1 to index point of array
mov r2, #0 # index r2 = 0
loop:
cmp r2, #20 # 20 elements?
beq end # Leave loop if 20 elements
add r3, r1, r2, LSL #2 # r3 = r1 + (r2*4)
str r2, [r3] # r3 = r2
add r2, r2, #1 # r2 = r2 + 1
b loop # branch to next loop iteration
print:
push {lr} # store return address
ldr r0, =string # format
bl printf # c printf
pop {pc} # return address
ARM confuses me enough as it is, I don't know what i'm doing wrong. If anyone could help me better understand how this works that would be much appreciated.
This might help down the line for others who want to know about how to allocate memory for array in arm assembly language
here is a simple example to add corresponding array elements and store in the third array.
.global _start
_start:
MOV R0, #5
LDR R1,=first_array # loading the address of first_array[0]
LDR R2,=second_array # loading the address of second_array[0]
LDR R7,=final_array # loading the address of final_array[0]
MOV R3,#5 # len of array
MOV R4,#0 # to store sum
check:
cmp R3,#1 # like condition in for loop for i>1
BNE loop # if R3 is not equal to 1 jump to the loop label
B _exit # else exit
loop:
LDR R5,[R1],#4 # loading the values and storing in registers and base register gets updated automatically R1 = R1 + 4
LDR R6,[R2],#4 # similarly
add R4,R5,R6
STR R4,[R7],#4 # storing the values back to the final array
SUB R3,R3,#1 # decrment value just like i-- in for loop
B check
_exit:
LDR R7,=final_array # before exiting checking the values stored
LDR R1, [R7] # R1 = 60
LDR R2, [R7,#4] # R2 = 80
LDR R3, [R7,#8] # R3 = 100
LDR R4, [R7,#12] # R4 = 120
MOV R7, #1 # terminate syscall, 1
SWI 0 # execute syscall
.data
first_array: .word 10,20,30,40
second_array: .word 50,60,70,80
final_array: .word 0,0,0,0,0
as mentioned your printf has problems, you can use the toolchain itself to see what the calling convention is, and then conform to that.
#include <stdio.h>
unsigned int a,b;
void notmain ( void )
{
printf("a[%d] = %d\n",a,b);
}
giving
00001008 <notmain>:
1008: e59f2010 ldr r2, [pc, #16] ; 1020 <notmain+0x18>
100c: e59f3010 ldr r3, [pc, #16] ; 1024 <notmain+0x1c>
1010: e5921000 ldr r1, [r2]
1014: e59f000c ldr r0, [pc, #12] ; 1028 <notmain+0x20>
1018: e5932000 ldr r2, [r3]
101c: eafffff8 b 1004 <printf>
1020: 0000903c andeq r9, r0, ip, lsr r0
1024: 00009038 andeq r9, r0, r8, lsr r0
1028: 0000102c andeq r1, r0, ip, lsr #32
Disassembly of section .rodata:
0000102c <.rodata>:
102c: 64255b61 strtvs r5, [r5], #-2913 ; 0xb61
1030: 203d205d eorscs r2, sp, sp, asr r0
1034: 000a6425 andeq r6, sl, r5, lsr #8
Disassembly of section .bss:
00009038 <b>:
9038: 00000000 andeq r0, r0, r0
0000903c <a>:
903c:
the calling convention is generally first parameter in r0, second in r1, third in r2 up to r3 then use the stack. There are many exceptions to this, but we can see here that the compiler which normally works fine with a printf call, wants the address of the format string in r0. the value of a then the value of b in r1 and r2 respectively.
Your printf has the string in r0, but a printf call with that format string needs three parameters.
The code above used a tail optimization and branch to printf rather than called it and returned from. The arm convention these days prefers the stack to be aligned on 64 bit boundaries, so you can put some register, you dont necessarily care to preserve on the push/pop in order to keep that alignment
push {r3,lr}
...
pop {r3,pc}
It certainly wont hurt you to do this, it may or may not hurt to not do it depending on what downstream assumes.
Your setup and loop should function just fine assuming that r1 (label a) is a word aligned address. Which it may or may not be if you mess with your string, should put a first then the string or put another alignment statement before a to insure the array is aligned. There are instruction set features that can simply the code, but it appears functional as is.

Assembly, can't tell what is wrong with my code ARM processor

.global reverse
.data
start: .word 1
end: .word 1
loopcount: .word 0
reverse:
ldr r3, =end
str r1, [r3]
next:
ldr r3,=end
ldr r2,=start
ldr r3, [r3]
ldr r2, [r2]
cmp r2,r3
bgt done
cmp r2,r3
beq done
sub r3,r2
mov r2,#0
mov r1,#0
loop:
cmp r2,r3
beq next2
add r2, r2, #1
add r1, r1, #4
b loop
next2:
ldr r3, [r0]
add r0, r0, r1
ldr r2, [r0]
str r3, [r0]
sub r0, r0, r1
str r2, [r0]
add r0, r0, #4
ldr r3,=end
ldr r1, [r3]
sub r1, r1, #1
str r1, [r3]
ldr r3,=start
ldr r1, [r3]
add r1, r1, #1
str r1, [r3]
b next
done:
bx lr
I am trying to reverse an array and this is my reverse function in assembly, the function is reverse(int data*,size); data is an array and size is the size. It works any array of size 5 but for longer arrays, for example of size 10, it will ignore the last 2 and act as if the array is of size 8 and swap it all as if the last 2 numbers in the array didn't exist, it would return:
Array:
1 2 3 4 5 6 7 8 9 10
My Return:
8 7 6 5 4 3 2 1 9 10
I can't seem to find out what my issue is.
Below is a crude and likely inefficient example of a reverse copy of one array to another; which may not be necessarily what you want but may foster some ideas.
Hardware : Marvell Armada 370/XP
model name : ARMv7 Processor rev 2 (v7l)
.bss
rArray: .zero
.data
Array: .word 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,32, 64, 128, 256, 512
.equ len.Array,.-Array
.text
.global main
main:
ldr r1,=Array // Array
mov r2, #len.Array // length of array
mov r3, #0 // zero init counter Array
ldr r4,=rArray // rArray
sub r5,r2, #4 // rArray counter - 1 element
1:
ldr r10, [r1,r3] // load word size element position x from Array
str r10, [r4,r5] // store word size element position x from Array into word size position y in rArray
add r3, r3, #4 // inc Array counter by 4 since word size is 4 bytes
subs r5, r5, #4 // decement rArray counter by 4 & get status (s)
bpl 1b // branch back to loop if positive or zero; i.e., N condition flag is clear
Results using GDB:
(gdb) x/21d $r1
0x1102d: 1 2 3 4
0x1103d: 5 6 7 8
0x1104d: 9 10 11 12
0x1105d: 13 14 15 16
0x1106d: 32 64 128 256
0x1107d: 512
(gdb) x/21d $r4
0x11082: 512 256 128 64
0x11092: 32 16 15 14
0x110a2: 13 12 11 10
0x110b2: 9 8 7 6
0x110c2: 5 4 3 2
0x110d2: 1

Resources