How to produce double digits numbers LC3 - loops

Having trouble printing any number past 10 in my LC3 program to multiply 2 input numbers to get a area. It works for any numbers below 10 so I know my multiplication is right. The problem is it produces weird symbols or nonsense anywhere higher then 10. So confused why. Here is my code:
.ORIG x3000
AND R3, R3, #0
AND R4, R4, #0
LD R5, INVERSE_ASCII_OFFSET
LD R6, DECIMAL_OFFSET
;---------------------
;receiving length
LEA R0, PROMPT1 ;load the address of the 'PROMPT1' message string
PUTS ;Prints PROMPT1
GETC ;get length
ADD R1, R0, #0
ADD R1, R1, R5
;receving width
LEA R0, PROMPT2 ;load the address of the 'PROMPT2' message string
PUTS ;Prints PROMPT2
GETC ;get width
ADD R2, R0, #0
ADD R2, R2, R5
;----------------------
;MULTIPLICATION to find AREA
ADD R4, R2, #0
FINDAREA:
ADD R3, R3, R1
ADD R4, R4, #-1
BRp FINDAREA
LEA R0, PROMPT3
PUTS
ADD R0, R3, R6
OUT
HALT
PROMPT1 .STRINGZ "ENTER LENGTH OF THE RECTANGLE:"
PROMPT2 .STRINGZ "ENTER WIDTH OF THE RECTANGLE:"
PROMPT3 .STRINGZ "AREA OF THE RECTANGLE:"
INVERSE_ASCII_OFFSET .fill xFFD0
DECIMAL_OFFSET .fill #48
.END```

Related

ARM assembly output isn't functioning correctly

This is supposed to output the contents of each line in arm assembly. Though line 18 add r4, r5, r4, lsl #1 isn't being outputted correctly and I am not sure why.
.data
str1: .asciz "%d and %d are the results \n"
n: .word word 1
.text
.global main
main: stmfd sp!, {lr}
ldr r4,=n
ldr r4, [r4]
add r4,r4, #1
mov r1, r4
ldr r0, =str1
bl printf
mov r5, r4
mov r1, r4
mov r2, r5
ldr r0, =str1
bl printf
add r4, r5, r4, lsl #1
mov r1, r4
ldr r0, = str1
bl printf
ldmfd sp!, {lr}
mov r0, #0
mov PC, or
.end

Self written simple memset not working with -03 eabi gcc on ARMv7

I wrote a very simple memset in c that works fine up to -O2 but not with -O3...
memset:
void * memset(void * blk, int c, size_t n)
{
unsigned char * dst = blk;
while (n-- > 0)
*dst++ = (unsigned char)c;
return blk;
}
...which compiles to this assembly when using -O2:
20000430 <memset>:
20000430: e3520000 cmp r2, #0 # compare param 'n' with zero
20000434: 012fff1e bxeq lr # if equal return to caller
20000438: e6ef1071 uxtb r1, r1 # else zero extend (extract byte from) param 'c'
2000043c: e0802002 add r2, r0, r2 # add pointer 'blk' to 'n'
20000440: e1a03000 mov r3, r0 # move pointer 'blk' to r3
20000444: e4c31001 strb r1, [r3], #1 # store value of 'c' to address of r3, increment r3 for next pass
20000448: e1530002 cmp r3, r2 # compare current store address to calculated max address
2000044c: 1afffffc bne 20000444 <memset+0x14> # if not equal store next byte
20000450: e12fff1e bx lr # else back to caller
This makes sense to me. I annotated what happens here.
When I compile it with -O3 the program crashes. My memset calls itself repeatedly until it ate the whole stack:
200005e4 <memset>:
200005e4: e3520000 cmp r2, #0 # compare param 'n' with zero
200005e8: e92d4010 push {r4, lr} # ? (1)
200005ec: e1a04000 mov r4, r0 # move pointer 'blk' to r4 (temp to hold return value)
200005f0: 0a000001 beq 200005fc <memset+0x18> # if equal (first line compare) jump to epilogue
200005f4: e6ef1071 uxtb r1, r1 # zero extend (extract byte from) param 'c'
200005f8: ebfffff9 bl 200005e4 <memset> # call myself ? (2)
200005fc: e1a00004 mov r0, r4 # epilogue start. move return value to r0
20000600: e8bd8010 pop {r4, pc} # restore r4 and back to caller
I can't figure out how this optimised version is supposed to work without any strb or similar. It doesn't matter if I try to set the memory to '0' or something else so the function is not only called on .bss (zero initialised) variables.
(1) This is a problem. This push gets endlessly repeated without a matching pop as it's called by (2) when the function doesn't early-exit because of 'n' being zero. I verified this with uart prints. Also r2 is never touched so why should the compare to zero ever become true?
Please help me understand what's happening here. Is the compiler assuming prerequisites that I may not fulfill?
Background: I'm using external code that requires memset in my baremetal project so I rolled my own. It's only used once on startup and not performance critical.
/edit: The compiler is called with these options:
arm-none-eabi-gcc -O3 -Wall -Wextra -fPIC -nostdlib -nostartfiles -marm -fstrict-volatile-bitfields -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon-vfpv3
Your first question (1). That is per the calling convention if you are going to make a nested function call you need to preserve the link register, and you need to be 64 bit aligned. The code uses r4 so that is the extra register saved. No magic there.
Your second question (2) it is not calling your memset it is optimizing your code because it sees it as an inefficient memset. Fuz has provided the answers to your question.
Rename the function
00000000 <xmemset>:
0: e3520000 cmp r2, #0
4: e92d4010 push {r4, lr}
8: e1a04000 mov r4, r0
c: 0a000001 beq 18 <xmemset+0x18>
10: e6ef1071 uxtb r1, r1
14: ebfffffe bl 0 <memset>
18: e1a00004 mov r0, r4
1c: e8bd8010 pop {r4, pc}
and you can see this.
If you were to use -ffreestanding as Fuz recommended then you see this or something like it
00000000 <xmemset>:
0: e3520000 cmp r2, #0
4: 012fff1e bxeq lr
8: e92d41f0 push {r4, r5, r6, r7, r8, lr}
c: e2426001 sub r6, r2, #1
10: e3560002 cmp r6, #2
14: e6efe071 uxtb lr, r1
18: 9a00002a bls c8 <xmemset+0xc8>
1c: e3a0c000 mov r12, #0
20: e3520023 cmp r2, #35 ; 0x23
24: e7c7c01e bfi r12, lr, #0, #8
28: e1a04122 lsr r4, r2, #2
2c: e7cfc41e bfi r12, lr, #8, #8
30: e7d7c81e bfi r12, lr, #16, #8
34: e7dfcc1e bfi r12, lr, #24, #8
38: 9a000024 bls d0 <xmemset+0xd0>
3c: e2445009 sub r5, r4, #9
40: e1a03000 mov r3, r0
44: e3c55007 bic r5, r5, #7
48: e3a07000 mov r7, #0
4c: e2851008 add r1, r5, #8
50: e1570005 cmp r7, r5
54: f5d3f0a0 pld [r3, #160] ; 0xa0
58: e1a08007 mov r8, r7
5c: e583c000 str r12, [r3]
60: e583c004 str r12, [r3, #4]
64: e2877008 add r7, r7, #8
68: e583c008 str r12, [r3, #8]
6c: e2833020 add r3, r3, #32
70: e503c014 str r12, [r3, #-20] ; 0xffffffec
74: e503c010 str r12, [r3, #-16]
78: e503c00c str r12, [r3, #-12]
7c: e503c008 str r12, [r3, #-8]
80: e503c004 str r12, [r3, #-4]
84: 1afffff1 bne 50 <xmemset+0x50>
88: e2811001 add r1, r1, #1
8c: e483c004 str r12, [r3], #4
90: e1540001 cmp r4, r1
94: 8afffffb bhi 88 <xmemset+0x88>
98: e3c23003 bic r3, r2, #3
9c: e1520003 cmp r2, r3
a0: e0466003 sub r6, r6, r3
a4: e0803003 add r3, r0, r3
a8: 08bd81f0 popeq {r4, r5, r6, r7, r8, pc}
ac: e3560000 cmp r6, #0
b0: e5c3e000 strb lr, [r3]
b4: 08bd81f0 popeq {r4, r5, r6, r7, r8, pc}
b8: e3560001 cmp r6, #1
bc: e5c3e001 strb lr, [r3, #1]
c0: 15c3e002 strbne lr, [r3, #2]
c4: e8bd81f0 pop {r4, r5, r6, r7, r8, pc}
c8: e1a03000 mov r3, r0
cc: eafffff6 b ac <xmemset+0xac>
d0: e1a03000 mov r3, r0
d4: e3a01000 mov r1, #0
d8: eaffffea b 88 <xmemset+0x88>
which appears like it simply inlined memset, the one it knows not your code (the faster one).
So if you want it to use your code then stick with -O2. Yours is pretty inefficient so not sure why you need to push it any further than it was.
20000444: e4c31001 strb r1, [r3], #1 # store value of 'c' to address of r3, increment r3 for next pass
20000448: e1530002 cmp r3, r2 # compare current store address to calculated max address
2000044c: 1afffffc bne 20000444 <memset+0x14> # if not equal store next byte
It isn't going to get any better than that without replacing your code with something else.
Fuz already answered the question:
Compile with -fno-builtin-memset. The compiler recognises that the function implements memset and thus replaces it with a call to memset. You should in general compile with -ffreestanding when writing bare-metal code. I believe this fixes this sort of problem, too
It is replacing your code with memset, if you want it not to do that use -ffreestanding.
If you wish to go beyond that and wonder why -fno-builtin-memset didn't work that is a question for the gcc folks, file a ticket, let us know what they say (or just look at the compiler source code).

Printing double digit numbers in LC-3

Been learning about LC-3 lately and was wondering how do I go about printing a number that's bigger then 9? In this program I made it asks for width and length and multiplies the two to get the area of the shape. My problem is any output bigger then 9 it starts printing a letter or a number not close to what I wanted. How should I go about doing this? my code:
.ORIG x3000
; Reset Registers
AND R0, R0, #0
AND R1, R1, #0
AND R2, R2, #0
AND R3, R3, #0
AND R4, R4, #0
AND R5, R5, #0
AND R6, R6, #0
AND R7, R7, #0
LEA R0, numberone
PUTS
GETC
OUT
LD R3, HEXN30
ADD R0, R0, R3
ADD R1, R0, #0
LEA R0, numbertwo
PUTS
GETC
OUT
ADD R0, R0, R3
ADD R6, R0, #0
LOOP
ADD R2, R2, R1
ADD R6, R6, #-1
BRp LOOP
LEA R0, MESG
PUTS
ADD R0, R2, x0
LD R2, NEG_TEN
ADD R2, R2, R0
BRn JUMP
AND R4, R4, #0
ADD R4, R4, R2
LD R0, ASCII_1
OUT
AND R0, R0, #0
ADD R0, R0, R4
JUMP
LD R3, HEX30 ;add 30 to integer to get integer character
ADD R0, R0, R3
OUT
HALT ;{TRAP 25}
numberone .stringz "\nPlease enter the length: "
numbertwo .stringz "\nPlease enter the width: "
MESG .STRINGZ "\n\nThe Area of the Rectangle is: "
HEXN30 .FILL xFFD0 ; -30 HEX
HEX30 .FILL x0030 ; 30 HEX
NEG_TEN .FILL #-10
ASCII_1 .FILL x0031 ; ASCII char '1'
.END
Example Output:
Please enter the length: 4
Please enter the width: 5
The area of the object is: 20
This code will always print 1:
LD R0, ASCII_1
OUT
There's no chance for it to print a 2 like you'd want from 4 times 5.
The next character prints as : because you have 10 in R4, as you haven't subtracted 10 enough times in the division by repetitive subtraction. You've only subtracted 10 once and 20 needs to have 10 subtracted twice (to get 0 as the remainder).
You should be able to see the first problem by simply reading the code there. How could loading an ascii 1 and printing it do anything other than print a 1? So there is some missing code there.
The latter problem is a form of off-by-one error (looping one to few iterations), which is something you should be looking for during single-step debugging. You want your division by repetitive subtraction to leave you with a remainder between 0 and 9 — definitely not 10!
Boundary conditions are very error prone. Using < when it should be <= (or vice vesa), for example, can result in an off by one error. What we do is try to use idioms to avoid these problems (like for(int i = 0; i < n; i++)), but when an idiom isn't applicable, then you want to be suspicious of and test those boundary conditions: at the boundary, at one less, at one more.

ARM Assembly Arrays

I am trying to figure out how arrays work in ARM assembly, but I am just overwhelmed. I want to initialize an array of size 20 to 0, 1, 2 and so on.
A[0] = 0
A[1] = 1
I can't even figure out how to print what I have to see if I did it correctly. This is what I have so far:
.data
.balign 4 # Memory location divisible by 4
string: .asciz "a[%d] = %d\n"
a: .skip 80 # allocates 20
.text
.global main
.extern printf
main:
push {ip, lr} # return address + dummy register
ldr r1, =a # set r1 to index point of array
mov r2, #0 # index r2 = 0
loop:
cmp r2, #20 # 20 elements?
beq end # Leave loop if 20 elements
add r3, r1, r2, LSL #2 # r3 = r1 + (r2*4)
str r2, [r3] # r3 = r2
add r2, r2, #1 # r2 = r2 + 1
b loop # branch to next loop iteration
print:
push {lr} # store return address
ldr r0, =string # format
bl printf # c printf
pop {pc} # return address
ARM confuses me enough as it is, I don't know what i'm doing wrong. If anyone could help me better understand how this works that would be much appreciated.
This might help down the line for others who want to know about how to allocate memory for array in arm assembly language
here is a simple example to add corresponding array elements and store in the third array.
.global _start
_start:
MOV R0, #5
LDR R1,=first_array # loading the address of first_array[0]
LDR R2,=second_array # loading the address of second_array[0]
LDR R7,=final_array # loading the address of final_array[0]
MOV R3,#5 # len of array
MOV R4,#0 # to store sum
check:
cmp R3,#1 # like condition in for loop for i>1
BNE loop # if R3 is not equal to 1 jump to the loop label
B _exit # else exit
loop:
LDR R5,[R1],#4 # loading the values and storing in registers and base register gets updated automatically R1 = R1 + 4
LDR R6,[R2],#4 # similarly
add R4,R5,R6
STR R4,[R7],#4 # storing the values back to the final array
SUB R3,R3,#1 # decrment value just like i-- in for loop
B check
_exit:
LDR R7,=final_array # before exiting checking the values stored
LDR R1, [R7] # R1 = 60
LDR R2, [R7,#4] # R2 = 80
LDR R3, [R7,#8] # R3 = 100
LDR R4, [R7,#12] # R4 = 120
MOV R7, #1 # terminate syscall, 1
SWI 0 # execute syscall
.data
first_array: .word 10,20,30,40
second_array: .word 50,60,70,80
final_array: .word 0,0,0,0,0
as mentioned your printf has problems, you can use the toolchain itself to see what the calling convention is, and then conform to that.
#include <stdio.h>
unsigned int a,b;
void notmain ( void )
{
printf("a[%d] = %d\n",a,b);
}
giving
00001008 <notmain>:
1008: e59f2010 ldr r2, [pc, #16] ; 1020 <notmain+0x18>
100c: e59f3010 ldr r3, [pc, #16] ; 1024 <notmain+0x1c>
1010: e5921000 ldr r1, [r2]
1014: e59f000c ldr r0, [pc, #12] ; 1028 <notmain+0x20>
1018: e5932000 ldr r2, [r3]
101c: eafffff8 b 1004 <printf>
1020: 0000903c andeq r9, r0, ip, lsr r0
1024: 00009038 andeq r9, r0, r8, lsr r0
1028: 0000102c andeq r1, r0, ip, lsr #32
Disassembly of section .rodata:
0000102c <.rodata>:
102c: 64255b61 strtvs r5, [r5], #-2913 ; 0xb61
1030: 203d205d eorscs r2, sp, sp, asr r0
1034: 000a6425 andeq r6, sl, r5, lsr #8
Disassembly of section .bss:
00009038 <b>:
9038: 00000000 andeq r0, r0, r0
0000903c <a>:
903c:
the calling convention is generally first parameter in r0, second in r1, third in r2 up to r3 then use the stack. There are many exceptions to this, but we can see here that the compiler which normally works fine with a printf call, wants the address of the format string in r0. the value of a then the value of b in r1 and r2 respectively.
Your printf has the string in r0, but a printf call with that format string needs three parameters.
The code above used a tail optimization and branch to printf rather than called it and returned from. The arm convention these days prefers the stack to be aligned on 64 bit boundaries, so you can put some register, you dont necessarily care to preserve on the push/pop in order to keep that alignment
push {r3,lr}
...
pop {r3,pc}
It certainly wont hurt you to do this, it may or may not hurt to not do it depending on what downstream assumes.
Your setup and loop should function just fine assuming that r1 (label a) is a word aligned address. Which it may or may not be if you mess with your string, should put a first then the string or put another alignment statement before a to insure the array is aligned. There are instruction set features that can simply the code, but it appears functional as is.

User controlled grid size for triangle made of *'s in Assembly Language

Enter grid size (1-9):3
*
**
***
Here's the code i have so far:
.ORIG x3000
LEA R0, PRINT
LEA R1, MEMSPAC
PUTS
GETC
PUTC
M STR R0, R1, #0
N .FILL M
LD R1, N
NOT R1, R1
ADD R1, R1, #1 ; R1 = -N
AND R2, R2, #0 ; R2 = holds number of *'s to be printed
LOOPA LEA R0, NEWLN
PUTS
ADD R3, R2, R1 ; while (R2 < N)
BRzp LOOPB
ADD R5, R5, #1 ;
ADD R4, R4, #1
LOOPC LD R0, STAR ; R0 = *
OUT ; Write *
ADD R5, R5, #-1
BRp LOOPC
ADD R5, R4, #0
ADD R2, R2, #1 ;
BRnzp LOOPA
LOOPB
LEA R0, NEWLN
PUTS
STOP HALT
STAR .FILL x2A
RIP TRAP x25
PRINT .STRINGZ "Enter grid size (1-9):"
NEWLN .STRINGZ "\n"
MEMSPAC .blkw 100
.END
This code works and it prints the triangle. However it goes into an infinite loop rather than printing grid size of 3. If i change the code "N .FILL M" to "N .FILL 3" it will print out grid size of 3. Any help would be greatly appreciated. Thanks

Resources