I couldn't find any tutorial on how to set the flags for the Carry on ARM to 1 or to 0. Can anybody help me out with that?
As in put ARM CPU in different modes and linux's irqflags.h, setting the mode, IRQ and carry flags can all be done in the same way.
Generally the macros is like,
.macro set_cflag, temp_reg
mrs \temp_reg, cpsr
bic \temp_reg, \temp_reg, #(1<<29)
msr cpsr_f, \temp_reg
.endm
.macro clear_cflag, temp_reg
mrs \temp_reg, cpsr
orr \temp_reg, \temp_reg, #(1<<29)
msr cpsr_f, \temp_reg
.endm
It is three steps,
Read the old value.
Update the flag in a working register.
Write the value back.
Some additional details are 'atomic' behaviour. Ie, you might need to disable interrupts and memory faults, etc. For some user code or a simple 'polling mode' bare metal, the above is fine.
If you really want to be 'efficient'; looking at your surrounding context and known registers you can execute some instruction that you know will set/clear the carry flag. For instance, if R0 is '0' then adds r0,r0,r0 will clear the carry flag. An instruction like eors R0,R0,R0 will not touch the carry bit. It may depend on whether you also need to know about other NZV bits. The notation 'cpsr_f' will only alter the NZCV bits. You can use msr cpsr_f, #NZCV_bits if you want to set/clear all of these. Ie, you don't care about the old value of all of them arch restrictions . The other flags, like mode, IRQ, etc will remain untouched.
See also:
Heyrick on status register.
ARM Q bit
Related
My C compiler (GCC) is producing this code which I don't think is optimal
8000500: 2b1c cmp r3, #28
8000502: bfd4 ite le
8000504: f841 3c70 strle.w r3, [r1, #-112]
8000508: f841 0c70 strgt.w r0, [r1, #-112]
It seems to me that the compiler could happily omit ITE LE instruction as the two stores following it use the LE and GT flags from the CMP instruction so only one will actuall be performed. The ITE instruction means that only one of the STRs will be tested and performed so the time should be equal, but it is using an extra word of instruction memory.
Any opinions on this ?
In Thumb mode, the instruction opcodes (other than branch instructions) don't have any space for conditional execution. In Thumb1, this meant that one simply had to use branches to skip instructions if necessary.
In Thumb2 mode, the IT instruction was added, which adds the conditional execution capability, without embedding it into the instruction opcodes themselves. In your case, the le condition part of the strle.w instruction is not embedded in the opcode f841 3c70, but is actually inferred from the preceding ite le instruction by the disassembler. If you use a hex editor to change the ite le instruction to something else, the strle.w and strgt.w will both suddenly disassemble into plain str.w.
See the other linked answer, https://stackoverflow.com/a/26001101, for more details.
The unified assembler syntax, which supports A32 and T32 targets, has added some confusion here. What is being shown in the disassembly is more verbose than what is encoded in the opcodes.
Your ITE instruction is very much a thumb instruction set placeholder, it defines an IT block which spans the following two instructions (and being thumb, those two instructions are not individually conditional). From a micro-architecture/timing point of view, it is only necessary to execute one instruction (but you shouldn't assume that this folding always takes place).
The strle/strgt syntax could be used on it's own for a T32 target, where the IT block is not necessary since the instruction set has a dedicated condition code field.
In order to write (or disassemble) code which can be used by both A32 and T32 assemblers, what you have here is both approaches to conditional execution written together. This has the advantage that the same assembly routine can be more portable (even if the resulting code is not identical - optimisations in the target cpu will also be different).
With T32, the combination of an it and a single 16 bit instruction matches the instruction density of the equivalent A32 instruction, if more than one conditional instruction can be combined, there is an overall win.
In ARMv7 ISA, how to determine that undefined instruction exception has occurred due to one of the floating point exceptions? Also I read that by default VFP units are disabled and when VFP instruction if used by an application for first time, then kernal will use the excpetion handling to enable the VFP unit and let the application continue. I suppose this exception will be undefined instruction exception.
I understand undefined instruction could be due to other cases also. I did some reading on undef handler in document ARM DUI 0471C page 128 where it says
Examine the undefined instruction to see if it has to be emulated. This is similar to the way in which an SVC handler extracts the number of an SVC, but rather than extracting the bottom 24 bits, the emulator must extract bits [27:24]
If bits [27:24] = b1110 or b110x, the instruction is a coprocessor instruction
The bit field do not seem to give me acurate indication that instruction was a floating point instruction, for instance bit [27:24] of VADD =0010. So this method dose not seem like best method to figure out.
From what i read in ARM ARM I could use FPEXC.DEX bit to figure out it was a floating point instruction exception. But this is so after we enable VFP unit. I will need to do this check first thing in the undef handler. What is the most appropriate method to detect exception from floating point instruction?
fpexc.en bit could be used for this purpose. The idea is simple
if VFP is disabled, make assumption that VFP instruction is a reason of this fault -> Enable VFP and try to execute the instruction again.
if VFP is enabled than there is indeed something really bad.
Handler looks pretty like this:
ctrl .req r0
push {...}
vmrs ctrl, fpexc // Check vfp status
tst ctrl, #(1 << 30) // fpexc.en
bne .L.undefinedHandler.die // if vfp enabled -> there is another reason for this exception
// enable VFP and try again
ldr ctrl, =fpexc.en.mask // enable vfp
vmsr fpexc, ctrl
// Reloading vfp state & d0-d31 regs
// some code is skipped here
pop {...}
subs pc, lr, #4 // return & try faulty instructions again
.L.undefinedHandler.die:
// F... really unknown instruction
Simply speaking instruction is executed twice with VFP disabled and VFP enabled. Valid VFP instruction will generate exception only once. Unknown instruction will generate exception twice. Pros: instruction parsing is redundant.
PS: A bit late, but might be useful for somebody :)
I was wondering why does not ARM Instructions set the CPSR by default (like x86), but the S bit must be used in these cases? When Instructions dont change the CPSR offer better performance? For example an ADD instruction offers better performance than ADDS? Or what is the real deal?
It is for performance or perhaps was. if you always change flags then you have a hard time using one flag on multiple instructions without a branch which messes with your pipeline.
if(a==0)
{
b=b+1;
c=0;
}
else
{
b=0;
c=c+1;
}
traditionally you have to literally implement that with branches (pseudocode not real asm)
cmp a,0
bne notzero
add b,b,1
mov c,0
b waszero
notzero:
mov b,0
add c,c,1
waszero:
so you suffer a branch no matter what
but with conditional execution
cmp a,0
addeq b,b,1
moveq c,0
addne c,c,1
movne b,0
no branches you simply rip through the code, now the only way this can work is 1) you have an option per instruction to conditionally execute based on flags and 2) instructions that modify the flags have an option not to modify the flags
Depending on the processor family/architecture the add and maybe even mov will modify the flags, so you have to have both the conditional execution AND the option not to set flags. That is why arm has an adds and an add.
I think they got rid of all that with the 64 bit architecture so perhaps as interesting and cool as it was maybe it wasnt used enough or worth it or they just needed those four bits to keep all/some instructions to 32 bits.
I was wondering why does not ARM Instructions set the CPSR by default (like x86), but the S bit must be used in these cases?
It is a choice and it depends on context. The extra flexibility is only limited by a programmers imagination.
When Instructions don't change the CPSR offer better performance? For example an ADD instruction offers better performance than ADDS?
Most likely neverNote1. Ie, an instruction that doesn't set CPSR does not execute faster (less clocks) for the majority of ARM CPUs and instructions.
Or what is the real deal?
Consider some 'C' code,
int i, sum;
char *p = array; /* passed in */
for(i = 0, sum = 0; i < 10 ; i++)
sum += arrary[i];
return sum;
This can translate to,
mov r2, r0 ; get "array" to R2
mov r1, #10 ; counter (reverse direction)
mov r0, #0 ; sum = 0
1:
subs r1, #1 ; set conditions
add r0, [r2], #1 ; does not affect conditions.
bne 1b
bx lr
In this case, the loop body is simple. However, if there are no conditionals with-in the loop, then a compiler (or assembler programmer) may schedule the loop decrement where ever they like and still set the conditions to be tested much later. This can be more important with more complex logic and where the CPU may have stalls due to data dependencies. It can also be important with conditional execution.
The optional 'S' is more a feature of many instructions than a single instruction.
Note1: Some one can always make an ARM CPU and do this. You would have to look at data sheets. I don't know of any CPU that take more time to set conditions.
How do I determine the endian mode the ARM processor is running in using only assembly language.
I can easily see the Thumb/ARM state reading bit 5 of the CPSR, but I don't know if there a corresponding bit in the CPSR or elsewhere for endianness.
;silly example trying to execute ARM code when I may be in Thumb mode....
MRS R0,CPSR
ANDS R0,#0x20
BNE ThumbModeIsActive
B ARMModeIsActive
I've got access to the ARM7TDMI data sheet, but this document does not tell me how to read the current state.
What assembly code do I use to determine the endianness?
Let's assume I'm using an ARM9 processor.
There is no CPSR bit for endianness in ARMv4 (ARM7TDMI) or ARMv5 (ARM9), so you need to use other means.
If your core implements system coprocessor 15, then you can check the bit 7 of the register 1:
MRC p15, 0, r0, c1, c0 ; CP15 register 1
TST r0, #0x80 ; check bit 7 (B)
BNE big_endian
B little_endian
However, the doc (ARM DDI 0100E) seems to hint that this bit is only valid for systems where the endianness is configurable at runtime. If it's set by the pin, the bit may be wrong. And, of course, on most(all?) ARM7 cores, the CP15 is not present.
There is a platform-independent way of checking the endianness which does not require any hardware bits. It goes something like this:
LDR R0, checkbytes
CMP R0, 0x12345678
BE big_endian
BNE little_endian
checkbytes
DB 0x12, 0x34, 0x56, 0x78
Depending on the current endianness, the load will produce either 0x12345678 or 0x78563412.
ARMv6 and later versions let you check CPSR bit E (9) for endianness.
Before ARMv6 co-processor 15 register c1 bit 7 should tell which endianness core is using.
In both cases 1 is big-endian while 0 is little-endian.
Not able to find any documentation on this instruction
Is this a macro or an instruction. It is used mainly in context switch but not able to undetstand its purpose
This is an MSR instruction, conditionally executed as Not Equal (NE).
MSR is used to move a value from a general purpose register to a system co-processor register. This can be used for all manner of things, as the system co-processor allows. It is often used for things such as cache invalidation/flushing.
The NE part makes the instruction dependant on the Zero status flag being set to zero, this occurs as the result of a previous flag-setting operation.