How NEON handle overflow?

How NEON handle overflow? - arm

I wonder how Neon handle overflow. E.g:
uint8x8_t vadd_u8 (uint8x8_t, uint8x8_t)
as I understood, this is a addition of 2 vector (each has 8-elements of unsigned byte). Suppose all values of both vectors are 255.
What result should we expect in this case? A 8-elements vector (510,...510) or something else?

8-bit element can have values only from 0 to 255. It can not contain 510.
vadd_u8 will wrap around => 255 + 255 = 510 % 256 = 254.
vqadd_u8 will saturate => 255 + 255 = min(510, 255) = 255.

Related

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n:
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
However, it doesn't explain why they work. What's the logic behind these macros?

Let's try for intuition on countmore.
First, ~0UL/255*(127-n) is a clever way of copying the value 127-n to all bytes in the word in parallel. Why does it work? ~0 is 255 in all bytes. Consequently, ~0/255 is 1 in all bytes. Multiplying by (127-n) does the "copying" mentioned at the outset.
The term ~0UL/255*127 is just a special case of the above where n is zero. It copies 127 into all bytes. That's 0x7f7f7f7f if words are 4 bytes. "Anding" with x zeros out the high order bit in each byte.
That's the first term (x)&~0UL/255*127). The result is the same as x except the high bit in each byte is zeroed.
The second term ~0UL/255*(127-(n)) is as above: 127-n copied to each byte.
For any given byte x[i], adding the two terms gives us 127-n+x[i] if x[i]<=127. This quantity will have the high order bit set whenever x[i]>n. It's easiest to see this as adding two 7-bit unsigned numbers. The result "overflows" into the 8th bit because the result is 128 or more.
So it looks like the algorithm is going to use the 8th bit of each byte as a boolean indicating x[i]>n.
So what about the other case, x[i]>127? Here we know the byte is more than n because the algorithm stipulates n<=127. The 8th bit ought to be always 1. Happily, the sum's 8th bit doesn't matter because the next step "or"s the result with x. Since x[i] has the 8th bit set to 1 if and only if it's 128 or larger, this operation "forces" the 8th bit to 1 just when the sum might provide a bad value.
To summarize so far, the "or" result has the 8th bit set to 1 in its i'th byte if and only if x[i]>n. Nice.
The next operation &~0UL/255*128 sets everything to zero except all those 8th bits of interest. It's "anding" with 0x80808080...
Now the task is to find the number of these bits set to 1. For this, countmore uses some basic number theory. First it shifts right 7 bits so the bits of interest are b0, b8, b16... The value of this word is
b0 + b8*2^8 + b16*2^16 + ...
A beautiful fact is that 1 == 2^8 == 2^16 == ... mod 255. In other words, each 1 bit is 1 mod 255. It follows that finding mod 255 of the shifted result is the same as summing b0+b8+b16+...
Yikes. We're done.

Let's analyse countless macro. We can simplify this macro as following code:
#define A(n) (0x0101010101010101UL * (0x7F+n))
#define B(x) (x & 0x7F7F7F7F7F7F7F7FUL)
#define C(x,n) (A(n) - B(x))
#define countless(x,n) (( C(x,n) & ~x & 0x8080808080808080UL) / 0x80 % 0xFF )
A(n) will be:
A(0) = 0x7F7F7F7F7F7F7F7F
A(1) = 0x8080808080808080
A(2) = 0x8181818181818181
A(3) = 0x8282828282828282
....
And for B(x), each byte of x will mask with 0x7F.
If we suppose x = 0xb0b1b2b3b4b5b6b7 and n = 0, then C(x,n) will equals to 0x(0x7F-b0)(0x7F-b1)(0x7F-b2)...
For example, We suppose x = 0x1234567811335577 and n = 0x50. So:
A(0x50) = 0xCFCFCFCFCFCFCFCF
B(0x1234567811335577) = 0x1234567811335577
C(0x1234567811335577, 0x50) = 0xBD9B7957BE9C7A58
~(0x1234567811335577) = 0xEDCBA987EECCAA88
0xEDCBA987EECCAA88 & 0x8080808080808080UL = 0x8080808080808080
C(0x1234567811335577, 0x50) & 0x8080808080808080 = 0x8080000080800000
(0x8080000080800000 / 0x80) % 0xFF = 4 //Count bytes that equal to 0x80 value.

efficient method to scale down 32 bit number to a 16 bit number

I'm trying to scale down a 32 bit value to a signed 16bit number (uint32_t -> int16_t), or in other words trying to scale down my uin32_t result to scale between 0 and 32767(int16_max). My code looks like this. In this snippet, my input range happens to be 0 to 90000. So an input of 90000 should correspond to 32767, and so on:
uint16_t scaled_estimate = 0;
uint32_t input = 85000;
uint32_t max_base = 90000;
uint16_t new_base = INT16_MAX; // 32767
uint16_t scaled_estimate = (input * new_base) / max_base;
if(scaled_estimate > new_base) scaled_estimate = new_base; // clamp
Is there a better way to achieve this scaling on embedded platforms or should I trust the compiler to do the right thing?

Converting 8.24 fixed point , 0.000000000000000 to 1.000000000000000 range to uint32_t in C

I'd like to convert 8.24 fixed point numbers from within range of\
0.000000000000000 -> 1.000000000000000 to uint32_t
Do I multiply decimal places or add or bitshift ?
I am receiving the 8.24 format fixed point numbers as 4 bytes
uint8_t meterDataRX[4];
// read 4 bytes from DSP channel
HAL_I2C_Master_Receive(&I2cHandle,bbboxDsp_address,meterDataRX,4,1);
uint32_t a;
a = (meterDataRX[0] << 24) | (meterDataRX[1] << 16) | (meterDataRX[2] << 8) | meterDataRX[3];
But not sure this is correct to start with!
The goal is to make values between uint8_t of 0x00 to 0xFF but should I make uint32_t values from 4 bytes 1st? the cast
uint8_t b;
b = (uint8_t)a;

You need to read the 4 byte, 8.24 fixed byte number as a 32-bit number. For a real number in the range 0 to 1 inclusive, the '8.24' fixed point number will be represented as a 32-bit number in the range 0 to 0x01000000 (integer part is 1, fractional part is 0). You wish to scale this to a number the range 0 to 0xFF.
Optional step: clamp out-of-range input number to a maximum value of 0x01000000:
if (a > 0x01000000) a = 0x01000000;
Multiply by 0xFF to give a number in the range 0x00000000 to 0xFF000000:
a *= 0xFF;
Optional step: for rounding rather than truncating, add the '8.24' fixed point representation of the real value 0.5, which is 0x00800000:
a += 0x00800000;
Shift right by 24 bits to strip the fractional part:
a >>= 24;
You will be left with a number in the range 0 to 0xFF.
Note that if you skip step 1 (clamping out-of-range numbers), inputs greater than 0x01008080 (representing the real value 1.00196075439453125) will result in arithmetic overflow. If you skip both steps 1 (clamping) and 3 (rounding), inputs greater than 0x01010101 (representing the real value 1.003921568393707275390625) will result in arithmetic overflow.

if the value is between 0 and 1 you need to scale it.
but if you want to just store it in the uint32 variable you need to change the order as now (in your code) your data is big endian but stm32 uCs use little endian.
a = ((uint32_t )meterDataRX[0] ) | ((uint32_t )meterDataRX[1] << 8) | ((uint32_t )meterDataRX[2] << 16) | ((uint32_t )meterDataRX[3] <<24);
you van also use union pinning for it

For 8.24 fixed point, each uint32_t (or equivelent) would hold a number in the range from 00000000.000000000000000000000000b to 11111111.111111111111111111111111b.
For a floating point number in the range 0.000000000000000 to 1.000000000000000, you'd have to multiply it by (1 << 24) before converting it to an unsigned integer; so that you end up with 8.24 fixed point.
For a uint8_t where 0x00 represent 0.0 and 0xFF represents 0.996; you'd have to multiply it by (1 << (24-8)) (or shift it left 16 places) to convert it to 8.24 fixed point.
For a uint8_t where 0x00 represent 0.0 and 0xFF represents 1.0; you'd have to multiply it by (1 << 24) (or shift it left 24 places) and then divide by 0xFF to convert it to 8.24 fixed point.
To convert 8.24 fixed point back to either of the cases above, you'd do the reverse (e.g. multiply by 0xFF then shift right by 24 places to get back to uint8_t where 0xFF represents 1.0).

How to subtract two unsigned ints with wrap around or overflow

There are two unsigned ints (x and y) that need to be subtracted. x is always larger than y. However, both x and y can wrap around; for example, if they were both bytes, after 0xff comes 0x00. The problem case is if x wraps around, while y does not. Now x appears to be smaller than y. Luckily, x will not wrap around twice (only once is guaranteed). Assuming bytes, x has wrapped and is now 0x2, whereas y has not and is 0xFE. The right answer of x - y is supposed to be 0x4.
Maybe,
( x > y) ? (x-y) : (x+0xff-y);
But I think there is another way, something involving 2s compliment?, and in this embedded system, x and y are the largest unsigned int types, so adding 0xff... is not possible
What is the best way to write the statement (target language is C)?

Assuming two unsigned integers:
If you know that one is supposed to be "larger" than the other, just subtract. It will work provided you haven't wrapped around more than once (obviously, if you have, you won't be able to tell).
If you don't know that one is larger than the other, subtract and cast the result to a signed int of the same width. It will work provided the difference between the two is in the range of the signed int (if not, you won't be able to tell).
To clarify: the scenario described by the original poster seems to be confusing people, but is typical of monotonically increasing fixed-width counters, such as hardware tick counters, or sequence numbers in protocols. The counter goes (e.g. for 8 bits) 0xfc, 0xfd, 0xfe, 0xff, 0x00, 0x01, 0x02, 0x03 etc., and you know that of the two values x and y that you have, x comes later. If x==0x02 and y==0xfe, the calculation x-y (as an 8-bit result) will give the correct answer of 4, assuming that subtraction of two n-bit values wraps modulo 2n - which C99 guarantees for subtraction of unsigned values. (Note: the C standard does not guarantee this behaviour for subtraction of signed values.)

Here's a little more detail of why it 'just works' when you subtract the 'smaller' from the 'larger'.
A couple of things going into this…
1. In hardware, subtraction uses addition: The appropriate operand is simply negated before being added.
2. In two’s complement (which pretty much everything uses), an integer is negated by inverting all the bits then adding 1.
Hardware does this more efficiently than it sounds from the above description, but that’s the basic algorithm for subtraction (even when values are unsigned).
So, lets figure 2 – 250 using 8bit unsigned integers. In binary we have
0 0 0 0 0 0 1 0
- 1 1 1 1 1 0 1 0
We negate the operand being subtracted and then add. Recall that to negate we invert all the bits then add 1. After inverting the bits of the second operand we have
0 0 0 0 0 1 0 1
Then after adding 1 we have
0 0 0 0 0 1 1 0
Now we perform addition...
0 0 0 0 0 0 1 0
+ 0 0 0 0 0 1 1 0
= 0 0 0 0 1 0 0 0 = 8, which is the result we wanted from 2 - 250

Maybe I don't understand, but what's wrong with:
unsigned r = x - y;

The question, as stated, is confusing. You said that you are subtracting unsigned values. If x is always larger than y, as you said, then x - y cannot possibly wrap around or overflow. So you just do x - y (if that's what you need) and that's it.

This is an efficient way to determine the amount of free space in a circular buffer or do sliding window flow control.
Use unsigned ints for head and tail - increment them and let them wrap!
Buffer length has to be a power of 2.
free = ((head - tail) & size_mask), where size_mask is 2^n-1 the buffer or window size.

Just to put the already correct answer into code:
If you know that x is the smaller value, the following calculation just works:
int main()
{
uint8_t x = 0xff;
uint8_t y = x + 20;
uint8_t res = y - x;
printf("Expect 20: %d\n", res); // res is 20
return 0;
}
If you do not know which one is smaller:
int main()
{
uint8_t x = 0xff;
uint8_t y = x + 20;
int8_t res1 = (int8_t)x - y;
int8_t res2 = (int8_t)y - x;
printf("Expect -20 and 20: %d and %d\n", res1, res2);
return 0;
}
Where the difference must be inside the range of uint8_t in this case.
The code experiment helped me to understand the solution better.

The problem should be stated as follows:
Let's assume the position (angle) of two pointers a and b of a clock is given by an uint8_t. The whole circumerence is devided into the 256 values of an uint8_t. How can the smaller distance between the two pointer be calculated efficiently?
A solution is:
uint8_t smaller_distance = abs( (int8_t)( a - b ) );
I suspect there is nothing more effient as otherwise there would be something more efficient than abs().

To echo everyone else replying, if you just subtract the two and interpret the result as unsigned you'll be fine.
Unless you have an explicit counterexample.
Your example of x = 0x2, y= 0x14 would not result in 0x4, it would result in 0xEE, unless you have more constraints on the math that are unstated.

Yet another answer, and hopefully easy to understand:
SUMMARY:
It's assumed the OP's x and y are assigned values from a counter, e.g., from a timer.
(x - y) will always give the value desired, even if the counter wraps.
This assumes the counter is incremented less than 2^N times between y and x,
for N-bit unsigned int's.
DESCRIPTION:
A counter variable is unsigned and it can wrap around.
A uint8 counter would have values:
0, 1, 2, ..., 255, 0, 1, 2, ..., 255, ...
The number of counter tics between two points can be calculated as shown below.
This assumes the counter is incremented less than 256 times, between y and x.
uint8 x, y, counter, counterTics;
<initalize the counter>
<do stuff while the counter increments>
y = counter;
<do stuff while the counter increments>
x = counter;
counterTics = x - y;
EXPLANATION:
For uint8, and the counter-tics from y to x is less than 256 (i.e., less than 2^8):
If (x >= y) then: the counter did not wrap, counterTics == x - y
If (x < y) then: the counter wrapped, counterTics == (256-y) + x
(256-y) is the number of tics before wrapping.
x is the number of tics after wrapping.
Note: if those calculations are made in the order shown, no negative numbers are involved.
This equation holds for both cases: counterTics == (256+x-y) mod 256
For uintN, where N is the number of bits:
counterTics == ((2^N)+x-y) mod (2^N)
The last equation also describes the result in C when subtracting unsigned int's, in general.
This is not to say the compiler or processor uses that equation when subtracting unsigned int's.
RATIONALE:
The explanation is consistent with what is described in this ACM paper:
"Understanding Integer Overflow in C/C++", by Dietz, et al.
HARDWARE INTEGER ARITHMETIC
When an n-bit addition or subtraction operation on unsigned or two’s complement integers overflows, the result “wraps around,” effectively subtracting 2n from, or adding 2n to, the true mathematical result. Equivalently, the result can be considered to occupy n+1 bits; the lower n bits are placed into the result register and the highest-order bit is placed into the processor’s carry flag.
INTEGER ARITHMETIC IN C AND C++
3.3. Unsigned Overflow
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Thus, the semantics for unsigned overflow in C/C++ are precisely the same as the semantics of processor-level unsigned overflow as described in Section 2. As shown in Table I, UINT MAX+1 must evaluate to zero in a conforming C and C++ implementation.
Also, it's easy to write a C program to test that the cases shown work as described.

What is the best way to evenly scale one byte?

In C I need to scale a uint8_t from 0 - 255 to 0 - 31
What is the best way to do this evenly?

If you're trying to scale from 8 bits to 5 bits, you can do a 3 bit shift;
uint8_t scaled = (uint8_t)(original >> 3);
This drops the lower 3 bits.

You can use some simple multiplication and division:
uint8_t scaled = (uint8_t)(((uint32_t)original * 32U) / 256U);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How NEON handle overflow? - arm

8-bit element can have values only from 0 to 255. It can not contain 510. vadd_u8 will wrap around => 255 + 255 = 510 % 256 = 254. vqadd_u8 will saturate => 255 + 255 = min(510, 255) = 255.

Related

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

efficient method to scale down 32 bit number to a 16 bit number

Converting 8.24 fixed point , 0.000000000000000 to 1.000000000000000 range to uint32_t in C

How to subtract two unsigned ints with wrap around or overflow

What is the best way to evenly scale one byte?

Categories

Resources