Checking for overflow - arm

I'm currently attempting to check for signed overflow when a multiplication operation is conducted by using the following method...
muls r0, r1, r0
blvs overflow
Which would branch off and print an overflow error message if overflow occurred. For example if the following input was r1 = 1,000,000,000 and r0 = 3 the output would be -64,771,072, but the branch to overflow is never done, why is that?

It's simply that multiply instructions will never set the overflow flag. MULS will only set the N and Z flags appropriately, and won't touch C or V (unless you're on something truly ancient where they get overwritten with meaningless nonsense).
If the significant bits of the result matter, you might want to consider using UMULL/SMULL instead.

There is only one bit to save overflow (another for carry) in ALU / APSR and those get set as a side effect of adding or subtracting arithmetic not because CPU realizes it will overflow. So in multiplication case those don't get set.
There is a nice blog post at ARM Connected Community titled Detecting Overflow from MUL which actually proposes how you can detect overflowing via some other means.

Related

What is the difference between the Q flag and the Overflow flag? [duplicate]

This question already has an answer here:
Importance of Q(Saturation Flag) in ARM
(1 answer)
Closed 2 years ago.
Q flag also known as saturation flag sets when the results causes an overflow or saturation. Similarly, the overflow flag also sets when the results causes an overflow. What is the major difference between these two flags ?
The Q flag is "sticky" it is not cleared my subsequent operations. It can there fore be used to determine whether saturation or overflow occurred at any point since it was last explicitly cleared.
The C (overflow or carry/borrow) flag is set or cleared as the result of a single arithmetic instruction, so must be tested immediately after the instruction that might set it. The C flag can be tested in conditional instructions and can be used to extend arithmetic allowing for example 64bit operations from 32 bit instructions.

32 bit operations on 8 bit architecture [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I just want to ask you is it possible to get 32-bit operations on 8-bit architecture and if yes - how?
I thought about this for some time and the best idea I have is to typedef char[N] to get types from N byte size and then implement functions such as add(char *, char *).
Thanks in advance!
(I'm using about the 6502 processor)
You have tagged your question as "C" so this answer takes this into consideration.
Most C compilers for 8-bit systems I know have long types. You can simply use these.
Having said this, how does it work?
All common 8-bit processors have a special 1-bit flag that receives the carry/borrow from 8-bit operations. And they have addition and subtraction instructions that take this flag into account. So a 32-bit add will be translated into this sequence:
; 1st operand in R0 to R3
; 2nd operand in R4 to R7
; addition works only with A(ccumulator)
; result goes into R0 to R3
MOV A,R0
ADD A,R4
MOV R0,A
MOV A,R1
ADDC A,R5
MOV R1,A
MOV A,R2
ADDC A,R6
MOV R2,A
MOV A,R3
ADDC A,R7
MOV R3,A
Think about how you do sums on paper. There is no need to add a carry on the rightmost digit, the least-significant one. Since there is "nothing" on the right, there is no carry. We can interpret each 8-bit step as one-digit operation on a digit of a number system of base 256.
For bit operations there is no need for a carry or borrow.
Another thought: What do you call an 8-bit system? When the instruction can just handle 8 bits in parallel, or when the data bus is just 8 bits wide?
For the latter case we can look at for example the 68008 processor. Internally a 32-bit processor its data bus has only 8 bits. Here you will use the 32-bit instructions. If the processor reads or writes a 32-bit value from/to memory it will generate 4 consecutive access cycles automatically.
Many (all that I know of...) CPUs have so called "carry flag" (1 bit), which is set when addition or substraction causes wrap-around. It is basically an extra bit for calculations. Then they have versions of addition and substraction, which include this carry flag. So you can do (for a example) 32-bit addition by doing 4 8-bit additions with carry.
Pseudocode example, little endian machine (so byte 0 of 4 byte result is the least significant byte):
carry,result[0] = opA[0] + opB[0]
carry,result[1] = opA[1] + opB[1] + carry
carry,result[2] = opA[2] + opB[2] + carry
carry,result[3] = opA[3] + opB[3] + carry
if carry == 1, overflow the 32 bit result
The first addition instruction might be called ADD (does not include carry, just sets it), while the following additions might be called ADC (includes carry and sets it). Some CPUs might have just ADC instruction, and reguire clearing the carry flag first.
If you use the standard int / long types, the compiler will automatically do the right thing. long has (at least) 32 bit, so no need for working with carry bits manually; the compiler is already capable of that. If possible, use the standard uint32_t/int32_t types for readability and portability. Examine the disassembled code to see how the compiler deals with 32 bit arithmetics.
In general, the answer to "Can I do M-bit arithmetic on a processor which has only N bits?" is "Certainly yes!"
To see why: back in school, you probably learned your addition and multiplication tables for only up to 10+10 and 10×10. Yet you have no trouble adding, subtracting, or multiplying numbers which are any number of digits long.
And, simply stated, that's how a computer can operate on numbers bigger than its bit width. If you have two 32-bit numbers, and you can only add them 8 bits at a time, the situation is almost exactly like having two 4-digit numbers which you can only add one digit at a time. In school, you learned how to add individual pairs of digits, and process the carry -- and similarly, the computer simply adds pairs of 8-bit numbers, and processes the carry. Subtraction and multiplication follow the same sorts of rules you learned in school, too. (Division, as always, can be trickier, although the long division algorithm you learned in school is often a good start for doing long computer division, too.)
It helps to have a very clear understanding of number systems with bases other than 10. I said, "If you have two 32-bit numbers, and you can only add them 8 bits at a time, the situation is almost exactly like having two 4-digit numbers which you can only add one digit at a time." Now, when you take two 32-bit numbers and add them 8 bits at a time, it turns out that you're doing arithmetic in base 256. That sounds crazy, at first: most people have never heard of base 256, and it seems like working in a base that big might be impossibly difficult. But it's actually perfectly straightforward, when you think about it.
(Just for fun, I once wrote some code to do arithmetic on arbitrarily big numbers, and it works in base 2147483648. That sounds really crazy at first -- but it's just as reasonable, and in fact it's how most arbitrary-precision libraries work. Although actually the "real" libraries probably use base 4294967296, because they're cleverer than me about processing carries, and they don't want to waste even a single bit.)

How does a processor(esp. ARM) interpret an overflow result at later stage in execution when the result is wirtten back to memory

Since processors follow the convention of representing numbers as 2's complement how do they know whether the number resulted from an addition of two positive numbers is still positive and not negative.
For example if I add two 32bit numbers:
Let r2 contains the value- 0x50192E32
Sample Code:
add r1, r2, #0x6F06410C
str r1, [r3]
Here an overflow flag is set.
Now if I want to use the stored result from memory in later instructions(somewhere in the code...and by now due to different instructions let the processors cpsr has been changed) as shown below:
ldr r5, [r3]
add r7, r5
As the result of the first add instruction has 1 in it's MSB i.e.now r5 has 1 in it's MSB how do the processor interpret the value. Since the correct result on adding two positive numbers is positive. Is it just because the MSB has 1, it interprets as negative number? In that case we get different results from expected one.
Let for example in a 4 bit machine:
2's complement: 4=0100 and 5=0101;
-4=1100 and -5=1011
now 4+5=9 and if it is stored in a register/memory as 1001, and later if it is being accessed by another instruction and given the processor stores numbers in 2's complement format and checks the MSB and thinks that it is a negative 7.
If it all depends upon a programmer then how do one store the correct results in reg/mem. Is there anyway that we can do to our code to store the correct results?
If you care about overflow conditions, then you'd need to check the overflow flag before the status register is overwritten by some other operation - depending on the language involved, this may result in an exception being generated, or the operation being retried using a longer integer type. However, many languages (C, for example) DON'T care about overflow conditions - if the result is out of range of the type, you simply get an incorrect result. If a program written in such a language needs to detect overflow, it would have to implement the check itself - for example, in the case of addition, if the operands have the same sign, but the result is different, there was an overflow.
I know I have covered this many times as have others.
The carry flag can be considered the unsigned overflow flag for addition it is also the borrow flag or not borrow flag for subtraction depending on your architecture. The v flag is the signed overflow flag for addition (subtraction). YOU are the only one who knows or cares whether or not the addition is signed or unsigned as for addition/subtraction it doesnt matter.
It doesnt matter what flag it is, or what architecture, YOU have to make sure that if you care about the result (be it the result or a flag) that you preserve that information for as long as you have to until you need to use it, it is not the processors job to do that nor the instruction set nor the architecture in general. It goes for the answers in the registers as it does for the flags, it is all on you the programmer. Just preserve the state if you care. This question is like saying how do you solve this:
if(a==b)
{
}
stuff;
stuff;
I want to do the if a == b thing now.
It is all on you the programmer to make that work do the compare at the time you need to use it instead of at some other time, save the result of the compare at the time of the compare and then check the condition at the time you need to use it.

Overflow using Fixed-Point Chebyshev Method

I have an algorithm in an embedded system that needs to calculate sin(theta), sin(2*theta), sin(3*theta), etc) with Q15 fixed-point arithmetic. sin(theta) and cos(theta) are generated using a LUT/interpolation combo, but I'm using the Chebyshev method to calculate the higher order Sines, which looks like this (pseudo-code):
sinN1 = Comes from Q15 LUT/interp algorithm
cosN1 = Comes from Q15 LUT/interp algorithm
sinN2 = (cosN1*sinN1)>>14;
sinN3 = (cosN1*sinN2)>>14 - sinN1;
sinN4 = (cosN1*sinN3)>>14 - sinN2;
....
The problem is that under certain conditions, this method yields a result which can overflow a Q15 variable. For example, lets consider when theta =2.61697:
sinN1 (Q15) = int(2**15*sin(2.61697)) = 16413
cosN1 (Q15) = int(2**15*cos(2.61697)) = -28361
sinN2 = (-28361*16413)>>14 = -28412 # OK
sinN3 = (-28361*-28412)>>14 - 16413 = 32768 # OVERFLOW BY 1
..
I never seem to overflow by more than an LSB or two. It seems to be an artifcat of compounding quantization. I'm using an ARM Cortex M4 processor, so I can add saturation logic with relatively few instructions, but I'm doing a lot of real-time streaming DSP with very low latency requirements so I need to save as much CPU as I can so I'm wondering if there is a more elegant way to handle this issue.
The mathematician in me wants to suggest keeping an estimate of the accumulated error and correcting for it in a sigma-delta fashion as elegant, but the pragmatic programmer in me says that's insanely impractical.
The SSAT instruction will saturate your result to any position you want, costs a single cycle, and should be easily available via the __ssat intrinsic on any non-rubbish compiler. Since you will inevitably accumulate errors with any non-integer arithmetic, I'm not sure there's really anything better than just doing the calculation and spending one extra cycle making sure it's in range.
What I can't quite work out is if you could get it entirely for free by dabbling in a bit of (inline) assembly to get at the optional shift in ssat that the intrinsic doesn't expose, on the basis that the multiplication/FP correction step is the biggest source of error; something like (pseudo-assembly):
mul tmp, cosN1, sinN2
ssat tmp, #16, tmp, asr #14
sub sinN3, tmp, sinN1
Which you can only safely get away with if you can guarantee to never end up with something like sinN3 = 0 - (-32768). QSUB16 is tempting to replace the SUB, but being a parallel operation would do weird stuff with the top halfword of sign bits, and as soon as you add a mask or halfword-packing instruction to correct for that you've lost the "for free" game.

Detect integer overflow

I am working with a large C library where some array indices are computed using int.
I need to find a way to trap integer overflows at runtime in such way as to narrow to problematic line of code. Libc manual states:
FPE_INTOVF_TRAP
Integer overflow (impossible in a C program unless you enable overflow trapping in a hardware-specific fashion).
however gcc option -ffpe-trap suggests that those only apply to FP numbers?
So how I do enable integer overflow trap? My system is Xeon/Core2, gcc-4.x, Linux 2.6
I have looked through similar questions but they all boil to modifying the code. I need to know however which code is problematic in the first place.
If Xeons can't trap overflows, which processors can? I have access to non-emt64 machines as well.
I have found a tool designed for llvm meanwhile: http://embed.cs.utah.edu/ioc/
There doesn't seem to be however an equivalent for gcc/icc?
Ok, I may have to answer my own question.
I found gcc has -ftrapv option, a quick test does confirm that at least on my system overflow is trapped. I will post more detailed info as I learn more since it seems very useful tool.
Unsigned integer arithmetic does not overflow, of course.
With signed integer arithmetic, overflow leads to undefined behaviour; anything could happen. And optimizers are getting aggressive about optimizing stuff that overflows. So, your best bet is to avoid the overflow, rather than trapping it when it happens. Consider using the CERT 'Secure Integer Library' (the URL referenced there seems to have gone AWOL/404; I'm not sure what's happened yet) or Google's 'Safe Integer Operation' library.
If you must trap overflow, you are going to need to specify which platform you are interested in (O/S including version, compiler including version), because the answer will be very platform specific.
Do you know exactly which line the overflow is occuring on? If so, you might be able to look at the assembler's Carry flag if the operation in question caused an overflow. This is the flag that the CPU uses to do large number calculation and, while not available at the C level, might help you to debug the problem - or at least give you a chance to do something.
BTW, found this link for gcc (-ftrapv) that talks about an integer trap. Might be what you are looking for.
You can use inline assembler in gcc to use an instruction that might generate an overflow and then test the overflow flag to see if it actually does:
int addo(int a, int b)
{
asm goto("add %0,%1; jo %l[overflow]" : : "r"(a), "r"(b) : "cc" : overflow);
return a+b;
overflow:
return 0;
}
In this case, it tries to add a and b, and if it does, it goes to the overflow label. If there's no overflow, it continues, doing the add again and returning it.
This runs into the GCC limitation that an inline asm block cannot both output a value and maybe branch -- if it weren't for that, you wouldn't need a second add to actually get the result.

Resources