How to use arm-int64? - arm

now int32 MOV R0, #0x1 code good working
but int64 not working.
MOV R0, instead what should i use?
Do you guys help?

You got this fundamentally wrong buddy.
Arm32 can't use 64 bit integer due to it's nature "Arm32".
Arm32:
Rn = 32 bit integer
Arm64:
Wn = 32 bit integer
Xn = 64 bit integer

Related

curious gcc compiler code for x |= 128 when x is uint8

I've recently stumbled upon an interesting compiler code which I don't understand.
Take the following code:
unsigned char x;
...
x |= 127;
x |= 128;
For the first statement, the compiler generates:
or eax, 0x7f.
However, for the second statement, it becomes:
or eax, 0xffffff80
It seems that for values less than 127, one byte values are used whereas after 128 dword's are preferred.
Does anybody have any idea why this happens?
I reproduced this gcc 6.2 (latest I think).
I tried to post on the gcc mailing lists (gcc-bugs#gcc.gnu.org or gcc-help#gcc.gnu.org ) but I only got delivery failures.
Both instructions are 3 bytes wide as is apparent from the disassembly output:
83 c8 7f or $0x7f,%eax
83 c8 80 or $0xffffff80,%eax
The 83 / 1 is 32-bit register / memory with 8-bit sign-extended immediate value:
83 /1 ib OR r/m32,imm8 r/m32 OR imm8 (sign-extended).
Thus in effect it does change the non-visible part of the 32-bit register, but it doesn't matter. It is not less efficient than any other method. There is also no instruction that would not sign-extend the 8-bit immediate value, except those that operate with 8-bit register halves/quarters. But using this instruction makes it work the same way with other registers that are addressable with r/m32 but which cannot be accessed as individual bytes (edi, esi for example).

How do I compute the 16-bit sum of the 8-bit values of an array in assembly?

Feel like I've been asking a lot of these questions lately lol, but assembly is still pretty foreign to me.
Using an Arduino, I have to write a function in Atmel AVR Assembly for my computer science class that calculates the sum of the 8-bit values in an array and returns it as a 16-bit integer. The function is supposed to take in an array of bytes and a byte representing the length of the array as arguments, with those arguments stored in r24 and r22, respectively, when the function is called. I am allowed to use branching instructions and such.
The code is in this format:
.global sumArray
sumArray:
//magic happens
ret
I know how to make loops and increment the counter and things like that, but I am really lost as to how I would do this.
I am unsure as to how I would do this. Does anyone know how to write this function in Atmel AVR Assembly? Any help would be much appreciated!
Why don't you ask the question to your compiler?
#include <stdint.h>
uint16_t sumArray(uint8_t *val, uint8_t count)
{
uint16_t sum = 0;
for (uint8_t i = 0; i < count; i++)
sum += val[i];
return sum;
}
Compiling with avr-gcc -std=c99 -mmcu=avr5 -Os -S sum8-16.c generates
the following assembly:
.global sumArray
sumArray:
mov r19, r24
movw r30, r24
ldi r24, 0
ldi r25, 0
.L2:
mov r18, r30
sub r18, r19
cp r18, r22
brsh .L5
ld r18, Z+
add r24, r18
adc r25,__zero_reg__
rjmp .L2
.L5:
ret
This may not be the most straight-forward solution, but if you study
this code, you can understand how it works and, hopefully, come with
your own version.
Iif you want something quick and dirty, add the two 8-bit values into an 8-bit register. If the sum is less than the inputs, then make a second 8-bit register equal to 1, otherwise 0. That's how you can do the carry.
The processor should already have something called a carry flag that you can use to this end.
with pencil and paper how do I add two two digit decimal numbers when I was only taught to add two single digit numbers at a time? 12 + 49? I can add the 2+9 = 11 then what do I do? (search for the word carry)

Using Int (32 bits) over char (8 bits) to 'help' processor

In C, often we use char for small number representations. However Processor always uses Int( or 32 bit) values for read from(or fetch from) registers. So every time we need to use a char or 8 bits in our program processor need to fetch 32 bits from regsiter and 'parse' 8 bits out of it.
Hence does it sense to use Int more often in place of char if memory is not the limitation?
Will it 'help' processor?
There's the compiler part and the cpu part.
If you tell the compiler you're using a char instead of an int, during static analysis it will know the bounds of the variable is between 0-255 instead of 0-(2^32-1). This will allow it to optimize your program better.
On the cpu side, your assumption isn't always correct. Take x86 as an example, it has registers eax and al for 32 bit and 8 bit register access. If you want to use chars only, using al is sufficient. There is no performance loss.
I did some simple benchmarks in response to below comments:
al:
format PE GUI 4.0
xor ecx, ecx
dec ecx
loop_start:
inc al
add al, al
dec al
dec al
loopd short loop_start
ret
eax:
format PE GUI 4.0
xor ecx, ecx
dec ecx
loop_start:
inc eax
add eax, eax
dec eax
dec eax
loopd short loop_start
ret
times:
$ time ./test_al.exe
./test_al.exe 0.01s user 0.00s system 0% cpu 7.102 total
$ time ./test_eax.exe
./test_eax.exe 0.01s user 0.01s system 0% cpu 7.120 total
So in this case, al is slightly faster, but sometimes eax came out faster. The difference is really negligible. But cpus aren't so simple, there might be code alignment issues, caches, and other things going on, so it's best to benchmark your own code to see if there's any performance improvement. But imo, if your code is not super tight, it's best to trust the compiler to optimize things.
I'd stick to int if I were you as that is probably the most native integral type for your platform. Internally you could expect shorter types to be converted to int so actually degrading performance.
You should never use char and expect it to be consistent across platforms. Although the C standard defines sizeof(char) to be 1, char itself could be signed or unsigned. The choice is down to the compiler.
If you believe that you can squeeze some performance gain in using an 8 bit type then be explicit and use signed char or unsigned char.
From ARM system developers guide
"most ARM data processing operations are 32-bit only. For this reason, you should use
a 32-bit datatype, int or long, for local variables wherever possible. Avoid using char and
short as local variable types, even if you are manipulating an 8- or 16-bit value"
an example code from the book to prove the point. note the wrap around handling for char as opposed to unsigned int.
int checksum_v1(int *data)
{
char i;
int sum = 0;
for (i = 0; i < 64; i++)
{
sum += data[i];
}
return sum;
}
ARM7 assembly when using i as a char
checksum_v1
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ; i = 0
checksum_v1_loop
LDR r3,[r2,r1,LSL #2] ; r3 = data[i]
ADD r1,r1,#1 ; r1 = i+1
AND r1,r1,#0xff ; i = (char)r1
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v1_loop ; if (i<64) loop
MOV pc,r14 ; return sum
ARM7 assembly when i is an unsigned int.
checksum_v2
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ; i = 0
checksum_v2_loop
LDR r3,[r2,r1,LSL #2] ; r3 = data[i]
ADD r1,r1,#1 ; r1++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v2_loop ; if (i<64) goto loop
MOV pc,r14 ; return sum
If your program is simple enough, the optimizer can do the right thing without you having to worry about it. In this case, plain int would be the simplest (and forward-proof) solution.
However, if you want really much to combine specific bit width and speed, you can use 7.18.1.3 Fastest minimum-width integer types from the C99 standard (requires C99-compliant compiler).
For example:
int_fast8_t x;
uint_fast8_t y;
are the signed and unsigned types that are guaranteed to be able to store at least 8 bits of data and use the usually faster underlying type. Of course, it all depends on what you are doing with the data afterwards.
For example, on all systems I have tested (see: standard type sizes in C++) the fast types were 8-bit long.

Assembly How to translate IMUL opcode (with only one oprand) to C code

Say I got
EDX = 0xA28
EAX = 0x0A280105
I run this ASM code
IMUL EDX
which to my understand only uses EAX.. if one oprand is specified
So in C code it should be like
EAX *= EDX;
correct?
After looking in debugger.. I found out EDX got altered too.
0x0A280105 * 0xA28 = 0x67264A5AC8
in debugger
EAX = 264A5AC8
EDX = 00000067
now if you take the answer 0x67264A5AC8 and split off first hex pair, 0x67 264A5AC8
you can clearly see why the EDX and EAX are the way they are.
Okay so a overflow happens.. as it cannot store such a huge number into 32 bits. so it starts using extra 8 bits in EDX
But my question is how would I do this in C code now to get same results?
I'm guessing it would be like
EAX *= EDX;
EDX = 0xFFFFFFFF - EAX; //blah not good with math manipulation like this.
The IMUL instruction actually produces a result twice the size of the operand (unless you use one of the newer versions that can specify a destination). So:
imul 8bit -> result = ax, 16bits
imul 16bit -> result = dx:ax, 32bits
imul 32bit -> result = edx:eax, 64bits
To do this in C will be dependent on the compiler, but some will work doing this:
long result = (long) eax * (long) edx;
eax = result & 0xffffffff;
edx = result >> 32;
This assumes a long is 64 bits. If the compiler has no 64 bit data type then calculating the result becomes much harder, you need to do long multiplication.
You could always inline the imul instruction.

Assembly Language: difference between ja and jg?

I am having trouble understanding the difference between ja and jg for assembly language. I have a section of code:
cmp dh, dl
j-- hit
and am asked which conditional jump to hit (that replaces j-- hit) will be taken with the hex value of DX = 0680.
This would make dl = 06 and dh = 80, so when comparing, 80 > 06. I know that jg fits this as we can directly compare results, but how should I approach solving if ja fits (or in this case, does not fit) this code?
If dx is 0x0680, then dh is 0x06 and dl is 0x80.
0x80 is interpreted as 128 in unsigned mode, and -128 in signed mode.
Thus, you have to use jg, since 6 > -128, but 6 < 128. jg does signed comparison; ja does unsigned comparison.
The difference between ja and jg is the fact that comparison is unsigned for ja and signed for jg (treating the registers as signed vs unsigned integers).
If the numbers are guaranteed to be positive (i.e. the sign bit is 0) then you should be fine. Otherwise you have to be careful.
You really can't intuit based on the comparison instruction itself if ja is applicable. You have to look at the context and decide if sign will be an issue.

Resources