How to "blend" two values without overflow? - c

Consider the following function:
// Return a blended value of x and y:
// blend(100, 200, 1, 1) -> 150
// blend(100, 200, 2, 1) -> 133
uint8_t blend(uint8_t x, uint8_t y, uint8_t parts_x, uint8_t parts_y) {
uint32_t big_parts_x = parts_x;
uint32_t big_parts_y = parts_y;
return (uint8_t) ((big_parts_x * x + big_parts_y * y) /
(big_parts_x + big_parts_y));
}
Is there a way to get close to appropriate return values without requiring any allocations greater than uint8_t? You could break it up (less rounding) into an addition of two uint16_t easily by performing two divisions. Can you do it with only uint8_t?

A standards compliant C implementation is guaranteed to perform arithmetic operations with at least 16 bits.
Section 6.3.1.1p2 of the C standard states:
The following may be used in an expression wherever an int or unsigned
int may be used:
An object or expression with an integer type (other than int or unsigned int ) whose integer conversion rank is less than
or equal to the rank of int and unsigned int .
A bit-field of type
_Bool , int , signed int ,or unsigned int .
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int ; otherwise, it is converted to an unsigned
int . These are called the integer promotions. All other
types are unchanged by the integer promotions.
Section E.1 also states that an int must be able to support values at least in the range -32767 to 32767, and an unsigned int must support values in at least the range 0 to 65535.
Since a uint8_t has lower rank than an int, the former will always be promoted to the latter when it is the subject of most operators, including +, -, * and /.
Given that, you can safely compute the value with the following slight modification:
uint8_t blend(uint8_t x, uint8_t y, uint8_t parts_x, uint8_t parts_y) {
return ((1u*parts_x*x) / (parts_x + parts_y)) + ((1u*parts_y*y) / (parts_x + parts_y));
}
The expressions parts_x*x and parts_y*y will have a maximum value of 65025. This is too big for a 16 bit int but not a 16 bit unsigned int, so each is multiplied by 1u to force the values to be converted to unsigned int as per the usual arithmetic conversions specified in section 6.3.1.8:
the integer promotions are performed on both operands. Then the
following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser
integer conversion rank is converted to the type of the operand
with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the
other operand, then the operand with signed integer type is
converted to the type of the operand with unsigned integer
type.
Note also that we divide each part by the sum total separately. If we added both parts first before dividing, the numerator could exceed 65535. By doing the division first, this brings each subexpession back down into the range of a uint8_t. Then we can add the two parts which will again be in the range of a uint8_t.
So the above expression is guaranteed to return a correct exact answer on a compiler that is compliant with the C standard.

The below will combine without any additional allocations.
Works even if int/unsigned is 16 bit.
return (uint8_t) ((1u*parts_x*x + 1u*parts_y*y) / (0u + parts_x + parts_y));

Is there a way to get close to appropriate return values without requiring any allocations greater than uint8_t?
In theory, yes:
uint8_t blend(uint8_t x, uint8_t y, uint8_t parts_x, uint8_t parts_y) {
return lookup_table[x][y][parts_x][parts_y];
}
In practice that's going to cost 4 GiB of RAM for the lookup table, so it's probably not a great idea.
Apart from that, it depends on what you mean by "close" (how large "acceptable worst case error" can be) and what range of values are valid (especially for parts_x and parts_y).
For example (if parts_x and parts_y have a range from 1 to 15 only):
uint8_t blend(uint8_t x, uint8_t y, uint8_t parts_x, uint8_t parts_y) {
uint8_t scaleX = (parts_x << 4) / (parts_x + parts_y);
uint8_t scaleY = (parts_y << 4) / (parts_x + parts_y);
return (x >> 4) * scaleX + (y >> 4) * scaleY;
}
Of course in this case "close" means:
blend(100, 200, 1, 1) = 6*8 + 12*8 = 144 (not 150)
blend(100, 200, 2, 1) = 6*10 + 12*5 = 120 (not 133)
Note that (in general) multiplication is "expanding". What I mean is that if a has M bits of range and b has N bits of range, then a*b will have M+N bits of range. In other words (using full range) to avoid overflow uint8_t * uint8_t = uint16_t. Division is significantly worse (e.g. to avoid precision loss, 1/3 needs infinite bits), some precision loss is impossible to avoid, the number of bits in the result determines how much precision loss, and 8 bits of precision is "not much".
Also note that the simple example I've shown above can be improved for some cases by adding extra code for those cases. For example:
uint8_t blend(uint8_t x, uint8_t y, uint8_t parts_x, uint8_t parts_y) {
if(parts_x < parts_y) {
return blend(y, x, parts_y, parts_x);
}
// parts_x <= parts_y now
if(parts_x == parts_y*2) {
return 2*(x/3) + y/3;
} else if(parts_x == parts_y*3) {
return 3*(x/4) + y/4;
} else if(parts_x == parts_y*4) {
return 4*(x/5) + y/5;
} else if(parts_x == parts_y*5) {
return 5*(x/6) + y/6;
} else if( (x > 16) && (y > 16) ){
uint8_t scaleX = (parts_x << 4) / (parts_x + parts_y);
uint8_t scaleY = (parts_y << 4) / (parts_x + parts_y);
return (x * scaleX + y * scaleY) >> 4;
} else {
uint8_t scaleX = (parts_x << 4) / (parts_x + parts_y);
uint8_t scaleY = (parts_y << 4) / (parts_x + parts_y);
return (x >> 4) * scaleX + (y >> 4) * scaleY;
}
}
Of course it's significantly easier and faster to use something larger than uint8_t, so...

Related

How to test the most significant bit of signed or unsigned integer?

Given a clock_t data type that is guaranteed to be an integer of some sort, how do you test the value of the most significant bit using portable C code? In other words, I need the definition of this function:
bool is_msb_set(clock_t clock);
But here's the twist: you don't know the size of clock_t nor if it is signed or unsigned (but assumed "signed" is twos-compliment).
My first thought was to do this:
const clock_t MSB = 1 << ((sizeof(clock_t) * 8) - 1);
bool is_msb_set(clock_t value) {
return value & MSB;
}
but the definition of MSB overflows if clock_t is a signed value. Maybe I'm overthinking this, but I'm stumped.
I think we can first check whether clock_t is signed or unsigned, and proceed accordingly.
bool is_msb_set(clock_t value) {
if ((clock_t)-1 < 0) {
/* clock_t is signed */
return value < 0;
} else {
/* clock_t is unsigned */
return ((value << 1) >> 1) ^ value;
}
}
Just test the value directly without a mask:
bool is_msb_set(clock_t value) {
if (value < 0) return 1;
return value >> (sizeof(clock_t) * CHAR_BIT - 1);
}
If the left side of >> operator has "a signed type and a negative value, the resulting value is implementation-defined", see C11 6.5.7p5.
Because we assumed "signed" is twos-compliment, I can just test if value is lower then 0, the most significant bit will always be set to one. If it is not negative, it's positive, and clock_t is signed then >> is properly defined.
If clock_t is unsigned, then value < 0 will always return 0 and most probably should be optimized out by the compiler.
The code shouldn't compile if clock_t is not an integer type (ex. if it a float or double), because the operands of >> need to have integer type. So it will only work for integer types.
What you can do is cast it to an integer which is guaranteed to be large enough (so, long?) and then do your test against 1 << ((sizeof(clock_t) * 8) - 1);.
I would also start by asserting that sizeof(clock_t) <= sizeof(long).

Iterate bits from left to right for any number

I am trying to implement Modular Exponentiation (square and multiply left to right) algorithm in c.
In order to iterate the bits from left to right, I can use masking which is explained in this link
In this example mask used is 0x80 which can work only for a number with max 8 bits.
In order to make it work for any number of bits, I need to assign mask dynamically but this makes it a bit complicated.
Is there any other solution by which it can be done.
Thanks in advance!
-------------EDIT-----------------------
long long base = 23;
long long exponent = 297;
long long mod = 327;
long long result = 1;
unsigned int mask;
for (mask = 0x80; mask != 0; mask >>= 1) {
result = (result * result) % mod; // Square
if (exponent & mask) {
result = (base * result) % mod; // Mul
}
}
As in this example, it will not work if I will use mask 0x80 but if I use 0x100 then it works fine.
Selecting the mask value at run time seems to be an overhead.
If you want to iterate over all bits, you first have to know how many bits there are in your type.
This is a surprisingly complicated matter:
sizeof gives you the number of bytes, but a byte can have more than 8 bits.
limits.h gives you CHAR_BIT to know the number of bits in a byte, but even if you multiply this by the sizeof your type, the result could still be wrong because unsigned types are allowed to contain padding bits that are not part of the number representation, while sizeof returns the storage size in bytes, which includes these padding bits.
Fortunately, this answer has an ingenious macro that can calculate the number of actual value bits based on the maximum value of the respective type:
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 + 4-12/((m)%31+3))
The maximum value of an unsigned type is surprisingly easy to get: just cast -1 to your unsigned type.
So, all in all, your code could look like this, including the macro above:
#define UNSIGNED_BITS IMAX_BITS((unsigned)-1)
// [...]
unsigned int mask;
for (mask = 1 << (UNSIGNED_BITS-1); mask != 0; mask >>= 1) {
// [...]
}
Note that applying this complicated macro has no runtime drawback at all, it's a compile-time constant.
Your algorithm seems unnecessarily complicated: bits from the exponent can be tested from the least significant to the most significant in a way that does not depend on the integer type nor its maximum value. Here is a simple implementation that does not need any special case for any size integers:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
unsigned long long base = (argc > 1) ? strtoull(argv[1], NULL, 0) : 23;
unsigned long long exponent = (argc > 2) ? strtoull(argv[2], NULL, 0) : 297;
unsigned long long mod = (argc > 3) ? strtoull(argv[3], NULL, 0) : 327;
unsigned long long y = exponent;
unsigned long long x = base;
unsigned long long result = 1;
for (;;) {
if (y & 1) {
result = result * x % mod;
}
if ((y >>= 1) == 0)
break;
x = x * x % mod;
}
printf("expmod(%llu, %llu, %llu) = %llu\n", base, exponent, mod, result);
return 0;
}
Without any command line arguments, it produces: expmod(23, 297, 327) = 185. You can try other numbers by passing the base, exponent and modulo as command line arguments.
EDIT:
If you must scan the bits in exponent from most significant to least significant, mask should be defined as the same type as exponent and initialized this way if the type is unsigned:
unsigned long long exponent = 297;
unsigned long long mask = 0;
mask = ~mask - (~mask >> 1);
If the type is signed, for complete portability, you must use the definition for its maximum value from <limits.h>. Note however that it would be more efficient to use the unsigned type.
long long exponent = 297;
long long mask = LLONG_MAX - (LLONG_MAX >> 1);
The loop will waste time running through all the most significant 0 bits, so a simpler loop could be used first to skip these bits:
while (mask > exponent) {
mask >>= 1;
}

C programming type casting and fixed point

How should you implement this function in C code?
U16 newValue function(U16 value, S16 x, U16 y){
newValue = min((((value - x) * y) >> 10) >> 4, 4095)
return newValue
}
y is fixed point with 10 fractional bits
If x is greater then value the final result should be 0.
My concern is the mix between different types especially and that overflow does not occur. Also how to write it in a clean why if there will be a lot of type casts.
You need to code the fucntion for all possible values of the parameters given in input. Take the expression (value - x). If value is equal to 2^16 and x is equal to 2^(-15), then the result of (value - x) would be 98304, bigger than a U16. Therefore, I would cast value to S32 before this operation.
Let's collapse the expression (value - x) to its maximum value 98304. The maximum value of the expression ((value - x) * y) would then be 98304 * 2^16 which is equal to 6442450944, which is a bigger value than a 32 bits integer can hold. Therefore, you'd need to compute this expression as an U64. You can simply replace the initial U32 cast to a S64 cast, since you'll need it anyway.
Right bit shift operations only reduce the number of significant bits. Therefore, this does not require a bigger number of bits to be computed.
The min call ensures that the result cannot be bigger than 4095, which can be held in a U16; no more cast should be necessary.
Final function:
uint16_t newValue(uint16_t value, int16_t x, uint16_t y){
int64_t newValue = (int64_t)(value);
newValue -= x;
newValue *= y;
newValue >>= 10;
newValue >>= 4;
newValue = min(newValue, 4095);
// Or as a one liner.
// uint64_t newValue = min(((((int64_t)value - x) * y) >> 10) >> 4, 4095);
return (uint16_t) newValue;
}
Here it is
unsigned int function(unsigned int value, signed int x, unsigned int y){
if((((value - x) * y) >> 10) >> 4<4095)
return (((value - x) * y) >> 10) >> 4;
else return 4095;
}

Finding the output of 2**32 % x in arc4random.c

I saw some code (in arc4random.c of libbsd) calculating 2**32 % x. A cleaned up version is below:
uint32_t x;
...
if (x >= 2) {
/* Calculate (2**32 % x) avoiding 64-bit math */
if (x > 0x80000000)
mod_res = 1 + ~x; /* 2**32 - x */
else {
/* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
mod_res = ((0xffffffff - (x * 2)) + 1) % x;
}
}
While the reasoning makes sense, my question is whether are there some obscure reasons not to use a simpler:
uint32_t x;
...
if (x >= 2) {
/* Calculate (2**32 % x) avoiding 64-bit math */
mod_res = -x % x;
}
Your code won't work on a machine where int is larger than 32 bits. In this case, in the expression -x, the operand would be promoted to int type, and thus become signed. This would cause the result of the expression -x % x to always be zero.
This behavior is due to C's integer promotion rules, which state that if an int can represent all values of an operand, then that operand will be promoted to an int. While this always preserves value, it may change the signedness of the type.
On a compiler with 32-bit ints it would work correctly, because unsigned int would not be promoted to int, and so -x would be equal to 2**32 - x.
Your version can be fixed by casting the promoted value back to unsigned:
mod_res = ((uint32_t) -x) % x;
Here is an example demonstrating this with a 16-bit type on a machine with 32-bit ints.

C standard on negative zero (1's complement and signed magnitude)

All of these functions gives the expected result on my machine. Do they all work on other platforms?
More specifically, if x has the bit representation 0xffffffff on 1's complement machines or 0x80000000 on signed magnitude machines what does the standard says about the representation of (unsigned)x ?
Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is this correct?
Assume sizeof(int) = 4 and CHAR_BIT = 8
int logicalrightshift_v1 (int x, int n) {
return (unsigned)x >> n;
}
int logicalrightshift_v2 (int x, int n) {
int msb = 0x4000000 << 1;
return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}
int logicalrightshift_v2a (int x, int n) {
return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}
int logicalrightshift_v3 (int x, int n) {
return ((x & 0x7fffffff) >> n) | (x < 0 ? (unsigned)0x80000000 >> n : 0);
}
int logicalrightshift_v4 (int x, int n) {
return ((x & 0x7fffffff) >> n) | (((unsigned)x & 0x80000000) >> n);
}
int logicalrightshift_v5 (int x, int n) {
unsigned y;
*(int *)&y = x;
y >>= n;
*(unsigned *)&x = y;
return x;
}
int logicalrightshift_v6 (int x, int n) {
unsigned y;
memcpy (&y, &x, sizeof (x));
y >>= n;
memcpy (&x, &y, sizeof (x));
return x;
}
If x has the bit representation 0xffffffff on 1's
complement machines or 0x80000000 on signed magnitude machines what
does the standard says about the representation of (unsigned)x ?
The conversion to unsigned is specified in terms of values, not representations. If you convert -1 to unsigned, you always get UINT_MAX (so if your unsigned is 32 bits, you always get 4294967295). This happens regardless of the representation of signed numbers that your implementation uses.
Likewise, if you convert -0 to unsigned then you always get 0. -0 is numerically equal to 0.
Note that a ones complement or sign-magnitude implementation is not required to support negative zeroes; if it does not, then accessing such a representation causes the program to have undefined behaviour.
Going through your functions one-by-one:
int logicalrightshift_v1(int x, int n)
{
return (unsigned)x >> n;
}
The result of this function for negative values of x will depend on UINT_MAX, and will further be implementation-defined if (unsigned)x >> n is not within the range of int. For example, logicalrightshift_v1(-1, 1) will return the value UINT_MAX / 2 regardless of what representation the machine uses for signed numbers.
int logicalrightshift_v2(int x, int n)
{
int msb = 0x4000000 << 1;
return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}
Almost everything about this is could be implementation-defined. Assuming that you are attempting to create a value in msb with 1 in the sign bit and zeroes in the value bits, you cannot do this portably by use of shifts - you can use ~INT_MAX, but this is allowed to have undefined behaviour on a sign-magnitude machine that does not allow negative zeroes, and is allowed to give an implementation-defined result on two's complement machines.
The types of 0x7fffffff and 0x80000000 will depend on the ranges of the various types, which will affect how other values in this expression are promoted.
int logicalrightshift_v2a(int x, int n)
{
return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}
If you create an unsigned value that is not in the range of int (for example, given a 32bit int, values > 0x7fffffff) then the implicit conversion in the return statement produces an implementation-defined value. The same applies to v3 and v4.
int logicalrightshift_v5(int x, int n)
{
unsigned y;
*(int *)&y = x;
y >>= n;
*(unsigned *)&x = y;
return x;
}
This is still implementation defined, because it is unspecified whether the sign bit in the representation of int corresponds to a value bit or a padding bit in the representation of unsigned. If it corresponds to a padding bit it could be a trap representation, in which case the behaviour is undefined.
int logicalrightshift_v6(int x, int n)
{
unsigned y;
memcpy (&y, &x, sizeof (x));
y >>= n;
memcpy (&x, &y, sizeof (x));
return x;
}
The same comments applying to v5 apply to this.
Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is
this correct?
It depends. As a hex constant, 0x80000000 will have type int if that value is within the range of int; otherwise unsigned if that value is within the range of unsigned; otherwise long if that value is within the range of long; otherwise unsigned long (because that value is within the minimum allowed range of unsigned long).
If you wish to ensure that it has unsigned type, then suffix the constant with a U, to 0x80000000U.
Summary:
Converting a number greater than INT_MAX to int gives an implementation-defined result (or indeed, allows an implementation-defined signal to be raised).
Converting an out-of-range number to unsigned is done by repeated addition or subtraction of UINT_MAX + 1, which means it depends on the mathematical value, not the representation.
Inspecting a negative int representation as unsigned is not portable (positive int representations are OK, though).
Generating a negative zero through use of bitwise operators and trying to use the resulting value is not portable.
If you want "logical shifts", then you should be using unsigned types everywhere. The signed types are designed for dealing with algorithms where the value is what matters, not the representation.
If you follow the standard to the word, none of these are guaranteed to be the same on all platforms.
In v5, you violate strict-aliasing, which is undefined behavior.
In v2 - v4, you have signed right-shift, which is implementation defined. (see comments for more details)
In v1, you have signed to unsigned cast, which is implementation defined when the number is out of range.
EDIT:
v6 might actually work given the following assumptions:
'int' is either 2's or 1's complement.
unsigned and int are exactly the same size (in both bytes and bits, and are densely packed).
The endian of unsigned matches that of int.
The padding and bit-layout is the same: (See caf's comment for more details.)

Resources