Instruction would span across PC

Instruction would span across PC - c

I use arm-gcc to compile the code for cortext-m4, but the code will crash when enter the function
char buf[20];
sprintf(buf, "%d",1);
I use lauterbach Trace32 to debug the code, before enter the "sprintf" function it is:
E92D000E siprintf: push {r1-r3}
E3E03102 mvn r3,#-0x80000000 ; r3,#-2147483648
E52DE004 push {r14}
E24DD070 sub r13,r13,#0x70 ; r13,r13,#112
E58D0008 str r0,[r13,#0x8]
E58D0018 str r0,[r13,#0x18]
E58D301C str r3,[r13,#0x1C]
E58D3010 str r3,[r13,#0x10]
E59F0038 ldr r0,0xE628
E59F3038 ldr r3,0xE62C
E59D2074 ldr r2,[r13,#0x74]
E58D3014 str r3,[r13,#0x14]
E28D1008 add r1,r13,#0x8 ; r1,r13,#8
E28D3078 add r3,r13,#0x78 ; r3,r13,#120
E5900000 ldr r0,[r0]
E58D3004 str r3,[r13,#0x4]
EB0000B2 bl 0xE8D8 ; _svfiprintf_r
E3A02000 mov r2,#0x0 ; r2,#0
E59D3008 ldr r3,[r13,#0x8]
but when it enter the function it change to:
000E siprintf: //////// ; **instruction would span across PC**
E92D align 0xE92D
E3E03102 mvn r3,#-0x80000000 ; r3,#-2147483648
E52DE004 push {r14}
E24DD070 sub r13,r13,#0x70 ; r13,r13,#112
E58D0008 str r0,[r13,#0x8]
E58D0018 str r0,[r13,#0x18]
E58D301C str r3,[r13,#0x1C]
E58D3010 str r3,[r13,#0x10]
E59F0038 ldr r0,0xE628
E59F3038 ldr r3,0xE62C
E59D2074 ldr r2,[r13,#0x74]
E58D3014 str r3,[r13,#0x14]
E28D1008 add r1,r13,#0x8 ; r1,r13,#8
E28D3078 add r3,r13,#0x78 ; r3,r13,#120
E5900000 ldr r0,[r0]
E58D3004 str r3,[r13,#0x4]
EB0000B2 bl 0xE8D8 ; _svfiprintf_r
E3A02000 mov r2,#0x0 ; r2,#0
E59D3008 ldr r3,[r13,#0x8]
which complier option or link option I should use to avoid this problem?

Related

how is _Bool implemented?

What is it that makes value in a variable of type _Bool 1 , even when we assign a value greater than 1 to it.
For ex:
_Bool tmp = 10;
printf("%x , %lu", tmp, sizeof(tmp));
This would print 1, 1. Trying to understand what is it that makes a variable of size Byte act as a single bit and when assigned a value > 1 which has LSB 0 still get converted to 1.

What is it that makes value in a variable of type _Bool 1 , even when we assign a value greater than 1 to it. The compiler does.
For example on ARM (arm-none-eabi-gcc):
#include "stdio.h"
#include "stdbool.h"
int main()
{
_Bool tmp = 10;
printf("%x , %lu", tmp, sizeof(tmp));
return 0;
}
compiles to:
.LC0:
.ascii "%x , %lu\000"
main:
stmfd sp!, {fp, lr}
add fp, sp, #4
sub sp, sp, #8
mov r3, #1
strb r3, [fp, #-5]
ldrb r3, [fp, #-5] # zero_extendqisi2
mov r2, #1
mov r1, r3
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
sub sp, fp, #4
ldmfd sp!, {fp, lr}
bx lr
.L3:
.word .LC0
you can see in the instruction mov r3, #1 that the compiler directly converts the initialisation value 10 to 1 as specified by the standard .

Execution does not go out of the loop in ARM

I want to print in ARM assembly language a given number in decimal in hexadecimal. I'm doing the function that does the conversion and the printing. So far the conversion works but the printing not at all.
It does only print a char at a time and it's not at all what I want, I want a special format of output such that I have 0x and 8 digits.
I wrote a function printf using the given function I had, called _writec that is working but only printing a char at a time. So I wrote a loop until I get the end of string function but here it seems that it doesn't care.
I've followed the execution step-by-step using gdb and it suddenly crash for no appearing reason. When r0 contain 0 it should go to .end according to my beq but it does not.
ARM Code:
.global _print_hex
_print_hex:
push {lr}
#According to .c algorithm : r0 = dec; r1 = quotient;
# r2 = temp; r3 = i ; r4 = j
mov fp, sp
sub sp, sp, #100 # 100 times size of char
mov r1, r0
mov r3, #0
_while:
cmp r1, #0
bne _computing
ldr r0, =.hex_0x
bl _printf
mov r4, #8
_for:
cmp r4, #0
bge _printing
ldr r0, =.endline
bl _printf
mov sp, fp
pop {pc}
_computing:
and r2, r1, #0xF
cmp r2, #10
blt .temp_less_10
add r2, #7
.temp_less_10:
add r2, #48
strb r2, [sp, r3]
add r3, #1
lsr r1, #4
b _while
_printing:
ldrb r0, [sp,r4]
bl _writec
sub r4, #1
b _for
_printf:
push {r0, r1, r2, r3, lr}
mov r1, r0
mov r2, #0
.loop:
ldrb r0, [r1,r2]
cmp r0, #0
beq .end
bl _writec
add r2, #1
b .loop
.end:
pop {r0, r1, r2, r3, lr}
bx lr
.hex_0x:
.asciz "0x"
.align 4
.endline:
.asciz "\n"
.align 4
C code (that I tried to translate):
void dec_to_hex(int dec){
int quotient, i, temp;
char hex[100];
quotient = dec;
i = 0;
while (quotient != 0){
temp = quotient % 16;
if (temp < 10){
temp += 48; // it goes in the ascii table between 48 and 57 that correspond to [0..9]
} else {
temp += 55; //it goes in the first cap letters from 65 to 70 [A..F]
}
hex[i]=(char)temp;
i++;
quotient /= 16;
}
printf("0x");
for(int j=i; j>=0; j--){
printf("%c", hex[j]);
}
printf("\n");
}
Here is the code of _writec :
/*
* Sends a character to the terminal through UART0
* The character is given in r0.
* IF the TX FIFO is full, this function awaits
* until there is room to send the given character.
*/
.align 2
.global _writec
.type _writec,%function
.func _writec,_writec
_writec:
push {r0,r1,r2,r3,lr}
mov r1, r0
mov r3, #1
lsl r3, #5 // TXFF = (1<<5)
ldr r0,[pc]
b .TXWAIT
.word UART0
.TXWAIT:
ldr r2, [r0,#0x18] // flags at offset 0x18
and r2, r2, r3 // TX FIFO Full set, so wait
cmp r2,#0
bne .TXWAIT
strb r1, [r0,#0x00] // TX at offset 0x00
pop {r0,r1,r2,r3,pc}
.size _writec, .-_writec
.endfunc
So in ARM when debugging it crashed at my first call of _printf and when I comment all the call to _printf it does print the result but not as the desired format. I only got the hex value.

Why can't I copy one array to another?

int main(int argc, char *argv[])
{
char string[100];
string = *argv[1];
}
Why doesn't this work? Do I actually need to use loops to iterate through each element and do everything the long way?

Why doesn't this work?
Because that's simply how it works in C. Trying with string = argv[1] (without *) would be a better guess, but you cannot copy arrays with simple assignments.
Do I actually need to use loops to iterate through each element and do everything the long way?
Unless you are prepared to use functions like strcpy, strncpy or strdup or something similar, then yes. Using strncpy in your code would look like this:
char string[100];
strncpy(string, argv[1], sizeof(string));
string[sizeof(string) - 1] = 0;
The last line is to make sure that string is terminated. Clunky? Yes, it is. There are better functions in some compilers like strlcpy, which is available on POSIX systems, but it's not a part of the C standard. If you use strlcpy instead of strncpy you can skip the last line.
If you're planing to do a lot of string copying and don't have a compiler supporting strlcpy, it might be a good idea to write your own implementation (good practice) or just copy an existing one. Here is one I found:
size_t
strlcpy(char *dst, const char *src, size_t siz)
{
char *d = dst;
const char *s = src;
size_t n = siz;
/* Copy as many bytes as will fit */
if (n != 0) {
while (--n != 0) {
if ((*d++ = *s++) == '\0')
break;
}
}
/* Not enough room in dst, add NUL and traverse rest of src */
if (n == 0) {
if (siz != 0)
*d = '\0'; /* NUL-terminate dst */
while (*s++)
;
}
return(s - src - 1); /* count does not include NUL */
}
Source: https://android.googlesource.com/platform/system/core.git/+/brillo-m7-dev/libcutils/strlcpy.c

In the main function in C argv is a vector to strings which are arrays of characters themself. So argv is a pointer to a pointer (like **char).
Your code assigns a reference to one pointer (to first argument).
char* string = argv[1]; would do it. To copy the whole string (array of characters) use strcpy. To copy all arguments use memcpy.
But usually in a C program you do not copy arguments, just use references to them.

The short answer is: because it is. In the C language only structures and unions are copied by value with one exception:
Initialization of the array
void foo(void)
{
char x[] = "This string literal will be copied! Test it yourself";
char z[] = "This string literal will be copied as well But because it is much loger memcpy will be used! Test it yourself";
float y[] = {1.0,2.0, 3,0,4.0,5.0,1.0,2.0, 3,0,4.0,5.0,1.0,2.0, 3,0,4.0,5.0};
long long w[] = {1,2,3,4,5,6,7,8,9,0};
foo1(x,z); // this functions are only to prevent the variable removal
foo2(y,w);
}
and the compiled code:
foo:
push {r4, lr}
sub sp, sp, #320
mov ip, sp
ldr lr, .L4
ldr r4, .L4+4
ldmia lr!, {r0, r1, r2, r3}
stmia ip!, {r0, r1, r2, r3}
ldmia lr!, {r0, r1, r2, r3}
stmia ip!, {r0, r1, r2, r3}
ldmia lr!, {r0, r1, r2, r3}
stmia ip!, {r0, r1, r2, r3}
ldm lr, {r0, r1}
str r0, [ip], #4
strb r1, [ip]
add r0, sp, #208
mov r2, #110
ldr r1, .L4+8
bl memcpy
mov r1, r4
add r0, sp, #56
mov r2, #72
bl memcpy
mov r2, #80
add r1, r4, #72
add r0, sp, #128
bl memcpy
add r1, sp, #208
mov r0, sp
bl foo1
add r1, sp, #128
add r0, sp, #56
bl foo2
add sp, sp, #320
pop {r4, pc}
.L4:
.word .LC2
.word .LANCHOR0
.word .LC3
.LC2:
.ascii "This string literal will be copied! Test it yoursel"
.ascii "f\000"
.LC3:
.ascii "This string literal will be copied as well But beca"
.ascii "use it is much loger memcpy will be used! Test it y"
.ascii "ourself\000"
Structures and unions are copied by the value sothe assignment copies the whole structure to another.
typedef struct
{
char str[100];
}string;
string a = {.str = "This string literal will be copied before main starts"},b;
void foo3(string c)
{
string g = a;
b = a;
foo4(g);
}
and the code:
foo3:
sub sp, sp, #16
push {r4, r5, r6, lr}
mov r6, #100
sub sp, sp, #104
ldr r5, .L4
add ip, sp, #116
add r4, sp, #4
stmib ip, {r0, r1, r2, r3}
mov r2, r6
mov r1, r5
mov r0, r4
bl memcpy
mov r2, r6
mov r1, r5
ldr r0, .L4+4
bl memcpy
add r1, sp, #20
mov r2, #84
add r0, sp, #136
bl memcpy
ldm r4, {r0, r1, r2, r3}
add sp, sp, #104
pop {r4, r5, r6, lr}
add sp, sp, #16
b foo4
.L4:
.word .LANCHOR0
.word b
a:
.ascii "This string literal will be copied before main star"
.ascii "ts\000"
you can play with it yourself:
https://godbolt.org/z/lag4uL

In C, an array name is not an L-value expression. Hence you can not use it in an assignment statement. To make a copy of a character array, you can either use a for statement or strcpy function, which is declared in string.h header file.

ARM Assembly Arrays

I am trying to figure out how arrays work in ARM assembly, but I am just overwhelmed. I want to initialize an array of size 20 to 0, 1, 2 and so on.
A[0] = 0
A[1] = 1
I can't even figure out how to print what I have to see if I did it correctly. This is what I have so far:
.data
.balign 4 # Memory location divisible by 4
string: .asciz "a[%d] = %d\n"
a: .skip 80 # allocates 20
.text
.global main
.extern printf
main:
push {ip, lr} # return address + dummy register
ldr r1, =a # set r1 to index point of array
mov r2, #0 # index r2 = 0
loop:
cmp r2, #20 # 20 elements?
beq end # Leave loop if 20 elements
add r3, r1, r2, LSL #2 # r3 = r1 + (r2*4)
str r2, [r3] # r3 = r2
add r2, r2, #1 # r2 = r2 + 1
b loop # branch to next loop iteration
print:
push {lr} # store return address
ldr r0, =string # format
bl printf # c printf
pop {pc} # return address
ARM confuses me enough as it is, I don't know what i'm doing wrong. If anyone could help me better understand how this works that would be much appreciated.

This might help down the line for others who want to know about how to allocate memory for array in arm assembly language
here is a simple example to add corresponding array elements and store in the third array.
.global _start
_start:
MOV R0, #5
LDR R1,=first_array # loading the address of first_array[0]
LDR R2,=second_array # loading the address of second_array[0]
LDR R7,=final_array # loading the address of final_array[0]
MOV R3,#5 # len of array
MOV R4,#0 # to store sum
check:
cmp R3,#1 # like condition in for loop for i>1
BNE loop # if R3 is not equal to 1 jump to the loop label
B _exit # else exit
loop:
LDR R5,[R1],#4 # loading the values and storing in registers and base register gets updated automatically R1 = R1 + 4
LDR R6,[R2],#4 # similarly
add R4,R5,R6
STR R4,[R7],#4 # storing the values back to the final array
SUB R3,R3,#1 # decrment value just like i-- in for loop
B check
_exit:
LDR R7,=final_array # before exiting checking the values stored
LDR R1, [R7] # R1 = 60
LDR R2, [R7,#4] # R2 = 80
LDR R3, [R7,#8] # R3 = 100
LDR R4, [R7,#12] # R4 = 120
MOV R7, #1 # terminate syscall, 1
SWI 0 # execute syscall
.data
first_array: .word 10,20,30,40
second_array: .word 50,60,70,80
final_array: .word 0,0,0,0,0

as mentioned your printf has problems, you can use the toolchain itself to see what the calling convention is, and then conform to that.
#include <stdio.h>
unsigned int a,b;
void notmain ( void )
{
printf("a[%d] = %d\n",a,b);
}
giving
00001008 <notmain>:
1008: e59f2010 ldr r2, [pc, #16] ; 1020 <notmain+0x18>
100c: e59f3010 ldr r3, [pc, #16] ; 1024 <notmain+0x1c>
1010: e5921000 ldr r1, [r2]
1014: e59f000c ldr r0, [pc, #12] ; 1028 <notmain+0x20>
1018: e5932000 ldr r2, [r3]
101c: eafffff8 b 1004 <printf>
1020: 0000903c andeq r9, r0, ip, lsr r0
1024: 00009038 andeq r9, r0, r8, lsr r0
1028: 0000102c andeq r1, r0, ip, lsr #32
Disassembly of section .rodata:
0000102c <.rodata>:
102c: 64255b61 strtvs r5, [r5], #-2913 ; 0xb61
1030: 203d205d eorscs r2, sp, sp, asr r0
1034: 000a6425 andeq r6, sl, r5, lsr #8
Disassembly of section .bss:
00009038 <b>:
9038: 00000000 andeq r0, r0, r0
0000903c <a>:
903c:
the calling convention is generally first parameter in r0, second in r1, third in r2 up to r3 then use the stack. There are many exceptions to this, but we can see here that the compiler which normally works fine with a printf call, wants the address of the format string in r0. the value of a then the value of b in r1 and r2 respectively.
Your printf has the string in r0, but a printf call with that format string needs three parameters.
The code above used a tail optimization and branch to printf rather than called it and returned from. The arm convention these days prefers the stack to be aligned on 64 bit boundaries, so you can put some register, you dont necessarily care to preserve on the push/pop in order to keep that alignment
push {r3,lr}
...
pop {r3,pc}
It certainly wont hurt you to do this, it may or may not hurt to not do it depending on what downstream assumes.
Your setup and loop should function just fine assuming that r1 (label a) is a word aligned address. Which it may or may not be if you mess with your string, should put a first then the string or put another alignment statement before a to insure the array is aligned. There are instruction set features that can simply the code, but it appears functional as is.

Need Help understanding ARM function

I'm still learning ARM and I couldn't understand what this function is supposed to do.
Can you guys help me out explaining how it works?
.text:0006379C EXPORT _nativeD2AB
.text:0006379C _nativeD2AB
.text:0006379C var_28 = -0x28
.text:0006379C
.text:0006379C STMFD SP!, {R4-R11,LR}
.text:000637A0 SUB SP, SP, #0x3A4
.text:000637A4 STMFA SP, {R0-R3}
.text:000637A8 LDR R0, =(_GLOBAL_OFFSET_ - 0x637B8)
.text:000637AC LDR R1, =(__stack_chk - 0x134EAC)
.text:000637B0 ADD R0, PC, R0 ; _GLOBAL_OFFSET_
.text:000637B4 LDR R0, [R1,R0] ; __stack_chk
.text:000637B8 LDR R0, [R0]
.text:000637BC STR R0, [SP,#0x3C8+var_28]
.text:000637C0 MOV R0, #1
.text:000637C4 ADR R1, sub_637D0
.text:000637C8 MUL R0, R1, R0
.text:000637CC MOV PC, R0
.text:000637CC ; End of function _nativeD2AB
.
.got:00134EAC _GLOBAL_OFFSET_TABLE_ DCD 0
.
.got:00134B0C AREA .got, DATA
.got:00134B0C __stack_chk DCD __stack_chkA
.
Found the rest of the function. If I understood some of it correctly, it seems to be scrambling the data, though that may be just a wild guess:
.text:000637D0 sub_637D0
.text:000637D0 MOV R0, #1
.text:000637D4 ADR R1, sub_637E0
.text:000637D8 MUL R0, R1, R0
.text:000637DC MOV PC, R0
.text:000637DC ; End of function sub_637D0
.text:000637E0 sub_637E0
.text:000637E0
.text:000637E0 arg_14 = 0x14
.text:000637E0
.text:000637E0 STR R2, [SP,#arg_14]
.text:000637E4 MOV R0, #1
.text:000637E8 ADR R1, loc_637F4
.text:000637EC MUL R0, R1, R0
.text:000637F0 MOV PC, R0
.text:000637F0 ; End of function sub_637E0
.text:000637F4 loc_637F4
.text:000637F4 STR R2, [SP,#0x28]
.text:000637F8 STR R0, [SP,#0x18]
.text:000637FC MOV R1, #2
.text:00063800 STR R2, [SP,#0x1C]
.text:00063804 STR R0, [SP,#0x20]
.text:00063808 STR R0, [SP,#0x24]

The function has several parts:
Store registers to the stacj and reserve space (Strangely, not restored)
Load to R0 the address of GLOBAL_OFFSET (Once added with PC), to actually access __stack_chk (When added to GLOBAL_OFFSET). This is done in a very strange way.
Load the data at __stack_chk and store it in the stack
Load to R0 the value of sub_637D0, by doing a multiplication by 1. This is the value returned by the function.
So in my opinion, this does not seem to do anything useful...