Beginner to ARM here.I have a function called bubblesort(in, n) where in is an array of integers and n is the length of the array.
I am trying to do this if condition in ARM:
if (a[j+1] < a[j]) {
temp = a[j];
a[j] = a[j+1];
a[j+1] = temp;
}
Here is the ARM code:
23 swap:
24 add r8,r5,#1 // r8 is j + 1
25 mul r8,r8,#4 // r8 = r8 * 4
26 add r9,r5 // r9 is j
27 mul r9,r9,#4 // r9 = r9 * 4
28 cmp [r0+r8],[r0+r9]
29 blt switch
I get these errors. I think I kind of understand the mul one. It is expecting register. What do I do if I am running out of registers? I only have these registers on the Pi. I have used others for other functions.
r0-r3 temp reg (I/O)
r4-r9 storage reg (Global)
r13 Link register
r15 Program counter
bubble.s:25: Error: ARM register expected -- `mul r8,r8,#4'
bubble.s:27: Error: ARM register expected -- `mul r9,r9,#4'
bubble.s:28: Error: ARM register expected -- `cmp [r0+r8],[r0+r9]'
Related
I have this program that i have to write in arm assembly to find the smallest element in an array. Normally this is a pretty easy thing to do in every programming language, but i just can't get my head around what i'm doing wrong in arm assembly. I'm a beginner in arm but i know my way around c. So I wrote the algorithm on how to find the smallest number in an array in c like this.
int minarray = arr[0];
for (int i =0; i < len; i++){
if (arr[i] < minarray){
minarray = arr[i];
}
It's easy and nothing special really.
Now i tried taking over the algorithm in arm almost the same. There are two things that have already been programmed from the beginning. The address of the first element is stored in register r0. The length of the array is stored in register r1. In the end, the smallest element must be stored back in register r0. Here is what i did:
This is almost the same algorithm as the one in c. First i load the first element into a new register r4. Now the first element is the smallest. Then once again, i load the first element in r8. I compare those two, if r8 <= r4, then copy the content of r8 to r4. After that (because i'm working with numbers of 32 bits) i add 4bytes to r0 to get on to the next element of the array. After that i subtract 1 from the array length to loop through the array until its below 0 to stop the program.
The feedback i'm getting from my testing function that was given to us to check if our program works says that it works partly. It says that it works for short arrays and arrays of length 0 but not for long arrays. I'm honestly lost. I think i'm making a really dumb mistake but i just cannot find it and i've been stuck at this easy problem for 3 days now but everything i have tried did not work or as i said, only worked "partly". I would really appreciate if someone could help me out.
This is the feedback that i get:
✗ min works with other numbers
✗ min works with a long array
✓ min works with a short array
✓ min tolerates size = 0
(x is for "it does not work", ✓ is for "it works")
So you see what i'm saying? i just do not understand how to implement the fact that its supposed to work with a longer array.
I'm not very good at ARM assembly by to my understanding R4 is expected to keep the value of minimum. R8 is used to keep the most recently fetched value from the input array.
The minimum is updated with this instruction:
MOVLE r8, r4
But it actually updated R8, not R4.
Try:
MOVLE r4, r8
EDIT
Other issue is using incorrect branch instruction:
SUBS r1, r1, #1
BPL loop1
works like:
r1 = r1 - 1
if (r1 >= 0) goto loop1;
For R1 equal to 1 the loop is exectured twice.
r1 = 1
... do stuff
r1 = r1 - 1 // r1 is 0 now
if (r1 >= 0) goto loop1; // 0>=0 TRUE!
... do stuff, overflow the input by indexing at `[r0 + 4]`
r1 = r1 - 1 // r1 is -1
if (r1 >= 0) goto loop1; // -1 >= 0 FALSE
// exit function
To fix it use branching only when input is non-zero.
BNE loop1
Coding in C use the correct types
You do not have to iterate from the index 0 only 1
int foo(const int *arr, size_t len)
{
int minarray = arr[0];
for (size_t i = 1; i < len; i++)
{
if (arr[i] < minarray)
{
minarray = arr[i];
}
}
return minarray;
}
And it generates this code:
foo:
mov r3, r0
subs r1, r1, #1
ldr r0, [r3], #4
beq .L1
.L3:
ldr r2, [r3], #4
cmp r0, r2
it ge
movge r0, r2
subs r1, r1, #1
bne .L3
.L1:
bx lr
devs! Could you help me? The project's goal is to translate byte code from a fictional architecture, generating an array of real machine code and make it run with jit, but I get a segmentation fault when I try to save a certain part of the output on a file. Part of the code responsible for this:
uint32_t length = sysconf(4096);
void * memory = mmap(0 , length , PROT_NONE , MAP_PRIVATE | MAP_ANONYMOUS , -1 , 0);
//{machine array receives the translated machine code here...}
mprotect ( memory , length , PROT_WRITE ) ;
// copying the machine code array to the memory
memcpy ( memory , ( void *) ( machine ) , sizeof ( machine ) ) ;
mprotect ( memory , length , PROT_EXEC ) ;
uint32_t length = sysconf(4096);
const uint32_t (* jit ) (int32_t*, uint8_t*) = ( uint32_t (*) (int32_t*, uint8_t*) ) ( memory );
// running the machine code to produce de outputs
// &R is the array of registers to store the output and &mem contains the original byte
// code to receive inputs from a instruction that changes the original code
(*jit)((int *)&R, (unsigned char *)&mem);
munmap(memory,length);
// printf/fprintf that causes the segmentation fault if we try to print n and ic[n]
// n = 0; - does not work to print the correct starting value for n
// fflush(stdout); - works to print the correct starting value for n
for(n = 0; n < 16; n++) {
// fprintf(output,"%02x:\n",n);
// fprintf(output,":%d\n",ic[n]);
fprintf(output,"%02x:%d\n",n,ic[n]);
// printf("%02x:%d\n",n,ic[n]);
// fflush(stdout);
}
for (k = 0; k < 16; k++) {
fprintf(output,"R[%d]=0x%08x\n",k,R[k]);
}
The original byte code translated to instructions on this pseudo-assembly code. On this code, the R's represent and array of registers that is passed to the real assembly code R0 is %rdi, R1 = %rdi+0x4,..., R15 = %rdi+0x3C.
Some of those pseudo-instructions translate to one or more actual assembly instructions, and [Rn] represents the memory location which contains the byte code for the original architecture. So when it access [Rn], it uses the current value for the register as the position to get the next 4 bytes (an instruction on the fantasy architecture is 4 bytes long).
mov R0, 0x006C
mov R1, 0x0001
mov R2, [R0]
cmp R15, R2
je 0x0030
mov R14, R2
jg 0x0000
jl 0x0000
add R13, R14
and R12, R13
or R11, R12
xor R10, R11
shl R10, 0x00
shr R10, 0x00
sub R2, R1
mov [R0], R2
jmp 0xFFC8
mov R1, 0x0004
add R0, R1
mov R2, [R0]
add R0, R1
mov R4, [R0]
add R0, R1
mov R8, [R0]
add R0, R1
mov R12, [R0]
jmp 0x0004
mov R3, R12
or R6, R13
add R10, R8
sub R15, R14
For the original architecture instructions (00 to 0F) and 16 registers (R[0] to R[15], the output should follow the model:
original instruction opcode: number of times executed
array of registers: value stored. Something like this:
00:2
01:1
...
0e:1
0f:1
R[0]=0x0000006c
R[1]=0x00000001
...
R[13]=0x03885533
R[14]=0x03885533
R[15]=0x00000000
The problem is that I keep getting a segmentation fault when I try to save the opcode: number of executions. If I try to print only the "opcode:" and register:value pairs, there's no segmentation fault, but instead of printing the first opcode value as "0:", it prints "6C:" which is the R[0] and the r12 (asm register) according to the gdb:
I have tried to insert the push rbp, mov rbp, rsp before the assembly code and the pop rbp, ret after the assembly, but nothing works. Any ideas that could help? Any more infos that I could provide?
Thanks for the help and have a good day.
This question already has answers here:
Can rip be used with another register with RIP-relative addressing?
(1 answer)
Referencing the contents of a memory location. (x86 addressing modes)
(2 answers)
Looping over arrays with inline assembly
(3 answers)
Closed 2 years ago.
I just started learning assembly for a contest and I am using intel-syntax inline assembly.
Recentlly I've learned how to access global C variable's in assembly using <type> ptr[rip + <variable name>] syntax. (for example in the mov command)
I tried to do the same thing with arrays and by adding a + <n * size of one element> after the variable name, I can access the nth element of the C array.
But I decided to write a program that iterates over an array and calculate the sum.
Here is the code:
#include <stdio.h>
unsigned long long array[10];
int main()
{
array[0] = 1;
array[1] = 1;
array[2] = 1;
array[3] = 1;
array[4] = 1;
array[5] = 1;
array[6] = 1;
array[7] = 1;
array[8] = 1;
array[9] = 1;
unsigned long long element_len = sizeof(*array);
unsigned long long len = sizeof(array);
unsigned long long sum = 0;
asm(R"(
.intel_syntax noprefix;
mov rcx, 0;
mov r8, 0;
loop:
add rcx, qword ptr[rip + array + r8];
add r8, rax;
cmp r8, rbx;
jb loop;
.att_syntax noprefix;
)"
: "=c" (sum) // outputs
: "a" (element_len), "b" (len) // inputs
: "r8" // clobbers
);
printf("%llu\n", sum);
return 0;
}
But because of that + r8; the compiler gives me this error (works fine when there is some constant value instead of r8)
$ gcc sum.c
sum.c: Assembler messages:
sum.c:61: Error: `qword ptr[rip+array+r8]' is not a valid base/index expression
I am using Kubuntu (technically Ubuntu) 20.04 on a 64-bit Intel processor and compiling the program using gcc 9.3.0.
EDIT:
Fixed by #Jester's comment.
asm(R"(
.intel_syntax noprefix;
mov rcx, 0;
mov r8, 0;
lea rdx, [rip+array];
loop:
add rcx, [rdx+r8];
add r8, rax;
cmp r8, rbx;
jb loop;
.att_syntax noprefix;
)"
: "=c" (sum) // outputs
: "a" (element_len), "b" (len) // inputs
: "r8" , "rdx"// clobbers
);
How exactly do I convert this C program into assembly code? I am having a hard time understanding this process or how to even start it. I am new to this. Any help would be appreciated!
while(a!=b){
if(a > b){
a = a - b;
}
else{
b = b - a;
}
}
return a;
}
Side Note: Assume two positive integers a and b are already given in register R0 and R1.
Can you leave comments explaining how you did it?
If you are using gcc, you can get the assembly as gcc -S -o a.s a.c if your source code is a.c. If you are using Visual Studio, you can get it when you debug by selecting the "disassembly" window. Here is the output of Visual studio (I named the subrountine/function called "common" that's why "common" appears):
while(a!=b){
003613DE mov eax,dword ptr [a]
003613E1 cmp eax,dword ptr [b]
003613E4 je common+44h (0361404h)
if(a > b){
003613E6 mov eax,dword ptr [a]
003613E9 cmp eax,dword ptr [b]
003613EC jle common+39h (03613F9h)
a = a - b;
003613EE mov eax,dword ptr [a]
003613F1 sub eax,dword ptr [b]
003613F4 mov dword ptr [a],eax
}
else{
003613F7 jmp common+42h (0361402h)
b = b - a;
003613F9 mov eax,dword ptr [b]
003613FC sub eax,dword ptr [a]
003613FF mov dword ptr [b],eax
}
}
00361402 jmp common+1Eh (03613DEh)
return a;
00361404 mov eax,dword ptr [a]
}
Here variable a is saved in memory initially and so is b (dword ptr [b]).
The professor that taught me system programming used what he called 'atomic-C' as a stepping stone between C and assembly. The rules for atomic-C are (to the best of my recollection):
only simple expressions allowed, i.e. a = b + c; is allowed a = b + c + d; is not allowed because there are two operators there.
only simple boolean expressions are allowed in an if statement, i.e. if (a < b) is allowed but if (( a < b) && (c < d)) is not allowed.
only if statements, no else blocks.
no for / while or do-while is allowed, only goto's and label's
So, the above program would translate into;
label1:
if (a == b)
goto label2;
if (a < b)
goto label4;
a = a - b;
goto label3;
label4:
b = b - a;
label3:
goto label1;
label2:
return a;
I hope I got that correct...it has been almost twenty years since I last had to write atomic-C. Now assuming the above is correct, lets start converting some of the atomic-C statements into MIPS (assuming that is what you are using) assembly. From the link provided by Elliott Frisch, we can almost immediately translate the subtraction steps:
a = a - b becomes R0 = R0 - R1 which is: SUBU R0, R0, R1
b = b - a becomes R1 = R1 - R0 which is: SUBU R1, R1, R0
I used unsigned subtraction due to both a and b being positive integers.
The comparisons can be done thusly:
if(a == b) goto label2 becomes if(R0 == R1) goto label2 which is: beq R0, R1, L2?
The problem here is that the third parameter of the beq op-code is the displacement that the PC moves. We will not know that value till we are done doing the hand assembly here.
The inequality is more work. If we leave of the pseudo code instructions, we first need to use the set on less than op-code which put a one in destination register if the first register is less than the second. Once we have done that, we can use the branch on equal as described above.
if(a < b) becomes slt R2, R0, R1
goto label4 beq R2, 1, L4?
Jumps are simple, they are just j and then the label to jump to. So,
goto label1 becomes j label1
Last thing we have to handle is the return. The return is done by moving the value we want to
a special register V0 and then jumping to the next instruction after the call to this function. The issue is MIPS doesn't have a register to register move command (or if it does I've forgotten it) so we move from a register to RAM and then back again. Finally, we use the special register R31 which holds the return address.
return a becomes var = a which is SW R0, var
ret = var which is LW var, V0
jump RA which is JR R31
With this information, the program becomes. And we can also adjust the jumps that we didn't know before:
L1:
0x0100 BEQ R0, R1, 8
0x0104 SLT R2, R0, R1 ; temp = (a < b) temp = 1 if true, 0 otherwise
0x0108 LUI R3, 0x01 ; load immediate 1 into register R3
0x010C BEQ R2, 1, 2 ; goto label4
0x0110 SUBU R0, R0, R1 ; a = a - b
0x0114 J L3 ; goto label3
L4:
0x0118 SUBU R1, R1, R0 ; b = b - a;
L3:
0x011C J L1 ; goto lable1
L2:
0x0120 SW R0, ret ; move return value from register to a RAM location
0x0123 LW ret, V0 ; move return value from RAM to the return register.
0x0124 JR R31 ; return to caller
It has been almost twenty years since I've had to do stuff like this (now a days, if I need assembly I just do what others have suggested and let the compiler do all the heavy lifting). I am sure that I've made a few errors along the way, and would be happy for any corrects or suggestions. I only went into this long-winded discussion because I interpreted the OP question as doing a hand translation -- something someone might do as they were learning assembly.
cheers.
I've translated that code to 16-bit NASM assembly:
loop:
cmp ax, bx
je .end; if A is not equal to B, then continue executing. Else, exit the loop
jg greater_than; if A is greater than B...
sub ax, bx; ... THEN subtract B from A...
jmp loop; ... and loop back to the beginning!
.greater_than:
sub bx, ax; ... ELSE, subtract A from B...
jmp loop; ... and loop back to the beginning!
.end:
push ax; return A
I used ax in place of r0 and bx in place of r1
ORG 000H // origin
MOV DPTR,#LUT // moves starting address of LUT to DPTR
MOV P1,#00000000B // sets P1 as output port
MOV P0,#00000000B // sets P0 as output port
MAIN: MOV R6,#230D // loads register R6 with 230D
SETB P3.5 // sets P3.5 as input port
MOV TMOD,#01100001B // Sets Timer1 as Mode2 counter & Timer0 as Mode1 timer
MOV TL1,#00000000B // loads TL1 with initial value
MOV TH1,#00000000B // loads TH1 with initial value
SETB TR1 // starts timer(counter) 1
BACK: MOV TH0,#00000000B // loads initial value to TH0
MOV TL0,#00000000B // loads initial value to TL0
SETB TR0 // starts timer 0
HERE: JNB TF0,HERE // checks for Timer 0 roll over
CLR TR0 // stops Timer0
CLR TF0 // clears Timer Flag 0
DJNZ R6,BACK
CLR TR1 // stops Timer(counter)1
CLR TF0 // clears Timer Flag 0
CLR TF1 // clears Timer Flag 1
ACALL DLOOP // Calls subroutine DLOOP for displaying the count
SJMP MAIN // jumps back to the main loop
DLOOP: MOV R5,#252D
BACK1: MOV A,TL1 // loads the current count to the accumulator
MOV B,#4D // loads register B with 4D
MUL AB // Multiplies the TL1 count with 4
MOV B,#100D // loads register B with 100D
DIV AB // isolates first digit of the count
SETB P1.0 // display driver transistor Q1 ON
ACALL DISPLAY // converts 1st digit to 7seg pattern
MOV P0,A // puts the pattern to port 0
ACALL DELAY
ACALL DELAY
MOV A,B
MOV B,#10D
DIV AB // isolates the second digit of the count
CLR P1.0 // display driver transistor Q1 OFF
SETB P1.1 // display driver transistor Q2 ON
ACALL DISPLAY // converts the 2nd digit to 7seg pattern
MOV P0,A
ACALL DELAY
ACALL DELAY
MOV A,B // moves the last digit of the count to accumulator
CLR P1.1 // display driver transistor Q2 OFF
SETB P1.2 // display driver transistor Q3 ON
ACALL DISPLAY // converts 3rd digit to 7seg pattern
MOV P0,A // puts the pattern to port 0
ACALL DELAY // calls 1ms delay
ACALL DELAY
CLR P1.2
DJNZ R5,BACK1 // repeats the subroutine DLOOP 100 times
MOV P0,#11111111B
RET
DELAY: MOV R7,#250D // 1ms delay
DEL1: DJNZ R7,DEL1
RET
DISPLAY: MOVC A,#A+DPTR // gets 7seg digit drive pattern for current value in A
CPL A
RET
LUT: DB 3FH // LUT starts here
DB 06H
DB 5BH
DB 4FH
DB 66H
DB 6DH
DB 7DH
DB 07H
DB 7FH
DB 6FH
END
Although this is compiler's task but if you want to make your hands dirty then look at godbolt
This is great compiler explorer tool let you convert your C/C++ code into the assembly line by line.
If you are a beginner and wants to know "How C program converts into the assembly?" then I have written a detailed post on it here.
http://ctoassembly.com
Try executing your code here. Just copy it inside the main function, define a and b variables before your while loop and you are good to go.
You can see how the code is compiled to assembly with a fair amount of explanation, and then you can execute the assembly code inside a hypothetical CPU.
This is the code:
section .data
v dw 4, 6, 8, 12
len equ 4
section .text
global main
main:
mov eax, 0 ;this is i
mov ebx, 0 ;this is j
cycle:
cmp eax, 2 ;i < len/2
jge exit
mov ebx, 0
jmp inner_cycle
continue:
inc eax
jmp cycle
inner_cycle:
cmp ebx, 2
jge continue
mov di, [v + eax * 2 * 2 + ebx * 2]
inc ebx
jmp inner_cycle
exit:
push dword 0
mov eax, 0
sub esp, 4
int 0x80
I'm using an array and scanning it as a matrix, this is the C translation of the above code
int m[4] = {1,2,3,4};
for(i = 0; i < 2; i++){
for(j = 0; j < 2; j++){
printf("%d\n", m[i*2 + j]);
}
}
When I try to compile the assembly code I get this error:
DoubleForMatrix.asm:20: error: beroset-p-592-invalid effective address
which refers to this line
mov di, [v + eax * 2 * 2 + ebx * 2]
can someone explain me what is wrong with this line? I think that it's because of the register dimensions, I tried with
mov edi, [v + eax * 2 * 2 + ebx * 2]
but I've got the same error.
This is assembly for Mac OS X, to make it work on another SO you have to change the exit syscall.
You can't use arbitrary expressions in assembler. Only a few addressingmodes are allowed.
basically the most complex form is register/imm+register*scale with scale 1,2,4,8
Of course constants (like 2*2) will probably be folded to 4, so that counts as a single scale with 4 (not as two multiplications)
Your example tries to do two multiplies at once.
Solution: insert an extra LEA instruction to calculate v+ebx*2 and use the result in the mov.
lea regx , [v+ebx*2]
mov edi, [eax*2*2+regx]
where regx is a free register.
The SIB (Scale Immediate Base) addressing mode takes only one Scale argument (1,2,4 or 8) to be applied to exactly one register.
The proposed solution is to premultiply eax by 4 (also has to modify the comparison). Then inc eax can be replaced with add eax,4 and the illegal instruction by mov di,[v+eax+ebx*2]
A higher level optimization would be just to for (i=0;i<4;i++) printf("%d\n",m[i]);