Constant time string equality test return value - c

Looking for a constant time string equality test I found that most of them use bit trickery on the return value. For example this piece of code:
int ctiszero(const void* x, size_t n)
{
volatile unsigned char r = 0;
for (size_t i = 0; i < n; i += 1) {
r |= ((unsigned char*)x)[i];
}
return 1 & ((r - 1) >> 8);
}
What is the purpose of return 1 & ((r - 1) >> 8);? Why not a simple return !r;?

As mentioned in one of my comments, this functions checks if an array of arbitrary bytes is zero or not. If all bytes are zero then 1 will be returned, otherwise 0 will be returned.
If there is at least one non-zero byte, then r will be non-zero as well. Subtract 1 and you get a value that is zero or positive (since r is unsigned). Shift all bits off of r and the result is zero, which is then masked with 1 resulting in zero, which is returned.
If all the bytes are zero, then the value of r will be zero as well. But here comes the "magic": In the expression r - 1 the value of r undergoes what is called usual arithmetic conversion, which leads to the value of r to become promoted to an int. The value is still zero, but now it's a signed integer. Subtract 1 and you will have -1, which with the usual two's complement notation is equal to 0xffffffff. Shift it so it becomes 0x00ffffff and mask with 1 results in 1. Which is returned.

With constant time code, typically code that may branch (and incur run-time time differences), like return !r; is avoided.
Note that a well optimized compiler may emit the exact same code for return 1 & ((r - 1) >> 8); as return !r;. This exercise is therefore, at best, code to coax the compiler input emitting constant time code.
What about uncommon platforms?
return 1 & ((r - 1) >> 8); is well explained by #Some programmer dude good answer when int is 8-bit 2's complement - something that is very common.
With 8-bit unsigned char, and r > 0, r-1 is non-negative and 1 & ((r - 1) >> 8) returns 0 even if int is 2's complement, 1's complement or sign-magnitude, 16-bit, 32-bit etc.
When r == 0, r-1 is -1. It is implementation define behavior what 1 & ((r - 1) >> 8) returns. It returns 1 with int as 2's complement or 1's complement, but 0 with sign-magnitude.
// fails with sign-magnitude (rare)
// fails when byte width > 8 (uncommon)
return 1 & ((r - 1) >> 8);
Small changes can fix to work as desired in more cases1. Also see #Eric Postpischil
By insuring r - 1 is done using unsigned math, int encoding is irrelevant.
// v--- add u v--- shift by byte width
return 1 & ((r - 1u) >> CHAR_BIT);
1 Somewhat rare: When unsigned char size is the same as unsigned, OP's code and this fix fail. If wider math integer was available, code could use that: e.g.: return 1 & ((r - 1LLU) >> CHAR_BIT);

That's shorthand for r > 128 or zero. Which is to say, it's a non-ASCII character. If r's high bit is set subtracting 1 from it will leave the high bit set unless the high bit is the only bit set. Thus greater than 128 (0x80) and if r is zero, underflow will set the high bit.
The result of the for loop then is that if any bytes have the high bit set, or if all of the bytes are zero, 1 will be returned. But if all the non-zero bytes do not have the high bit set 0 will be returned.
Oddly, for a string of all 0x80 and 0x00 bytes 0 will still be returned. Not sure if that's a "feature" or not!

Related

arithmetic right shift shifts in 0s when MSB is 1

As an exercise I have to write the following function:
multiply x by 2, saturating to Tmin / Tmax if overflow, using only bit-wise and bit-shift operations.
Now this is my code:
// xor MSB and 2nd MSB. if diferent, we have an overflow and SHOULD get 0xFFFFFFFF. otherwise we get 0.
int overflowmask = ((x & 0x80000000) ^ ((x & 0x40000000)<<1)) >>31;
// ^ this arithmetic bit shift seems to be wrong
// this gets you Tmin if x < 0 or Tmax if x >= 0
int overflowreplace = ((x>>31)^0x7FFFFFFF);
// if overflow, return x*2, otherwise overflowreplace
return ((x<<1) & ~overflowmask)|(overflowreplace & overflowmask);
now when overflowmask should be 0xFFFFFFFF, it is 1 instead, which means that the arithmetic bit shift >>31 shifted in 0s instead of 1s (MSB got XORed to 1, then shifted to the bottom).
x is signed and the MSB is 1, so according to C99 an arithmetic right shift should fill in 1s. What am I missing?
EDIT: I just guessed that this code isn't correct. To detect an overflow it suffices for the 2nd MSB to be 1.
However, I still wonder why the bit shift filled in 0s.
EDIT:
Example: x = 0xA0000000
x & 0x80000000 = 0x80000000
x & 0x40000000 = 0
XOR => 0x80000000
>>31 => 0x00000001
EDIT:
Solution:
int msb = x & 0x80000000;
int msb2 = (x & 0x40000000) <<1;
int overflowmask = (msb2 | (msb^msb2)) >>31;
int overflowreplace = (x >>31) ^ 0x7FFFFFFF;
return ((x<<1) & ~overflowmask) | (overflowreplace & overflowmask);
Even on twos-complement machines, the behaviour of right-shift (>>) on negative operands is implementation-defined.
A safer approach is to work with unsigned types and explicitly OR-in the MSB.
While you're at it, you probably also want to use fixed-width types (e.g. uint32_t) rather than failing on platforms that don't meet your expectations.
0x80000000 is treated as an unsigned number which causes everything to be converted to unsigned, You can do this:
// xor MSB and 2nd MSB. if diferent, we have an overflow and SHOULD get 0xFFFFFFFF. otherwise we get 0.
int overflowmask = ((x & (0x40000000 << 1)) ^ ((x & 0x40000000)<<1)) >>31;
// this gets you Tmin if x < 0 or Tmax if x >= 0
int overflowreplace = ((x>>31)^0x7FFFFFFF);
// if overflow, return x*2, otherwise overflowreplace
return ((x<<1) & ~overflowmask)|(overflowreplace & overflowmask);
OR write the constants in negative decimals
OR I would store all the constants in const int variables to have them guaranteed signed.
Never use bit-wise operands on signed types. In case of right shift on signed integers, it is up to the compiler if you get an arithmetic or a logical shift.
That's only one of your problems though. When you use a hex integer constant 0x80000000, it is actually of type unsigned int as explained here. This accidentally turns your whole expression (x & 0x80000000) ^ ... into unsigned type because of the integer promotion rule known as "the usual arithmetic conversions". Whereas the 0x40000000 expression is signed int and works as (the specific compiler) expected.
Solution:
All variables involved must be of type uint32_t.
All hex constants involved must be u suffixed.
To get something arithmetic shift portably, you would have to do
(x >> n) | (0xFFFFFFFFu << (32-n)) or some similar hack.

bitwise operationd to write individual byte of an integer

How do I set the nth byte of an 64 bit unsigned integer regardless of endian type in c ? One of the possible methods I tried is set each bit in a loop.
Assuming n = 0 is the least significant byte, why can't you just do the following:
x |= (0xffull << (n * 8));
If x = 0 and n = 2 this sets x to 0x0ff0000. Unless I am missing something? I don't see what endian-ness has to do with the problem.

Bit Operations - Indicating the Sign of a Signed Integer

Why does the following C code not work for returning -1 for negative numbers, 0 for 0s, and 1 for positive numbers?
(((x >> 31) ^ -1) + 1) | (!x ^ 1);
Specifically, when I pass in negative numbers, it returns 1. It seems like if I have a negative number, though (i.e., the the least significant bit is a 1 after the 31 bit shift), XORing it with -1 will give me -2 (i.e., all 1s and a 0 in the least significant bit location), and adding 1 would make it -1.
According to the C99 standard, the result of x >> n if x is negative is implementation defined. So the reason you are having a problem depends on your compiler and architecture.
However, it's most likely that the x is sign extended when you shift it i.e. the top bit is repeated to keep the sign the same as the operand. This is what happens with my compiler. So for any negative number, x >> 31 is -1. Also, for any non zero number !x is 0 (i.e. false). This applies assuming x is a 32 bit integer. If you make x an unsigned int, it should work, but consider the following alternative:
(x < 0) ? -1 : ((x > 0) ? 1 : 0)
which I think is a bit less cryptic.
And here is a program that you can use to see what your expression is doing
#include <stdio.h>
#define EVALUATE(x) printf("%s = %d\n", #x, x)
int main(int argc, char** argv)
{
unsigned int x = 51;
EVALUATE(x >> 31);
EVALUATE(((x >> 31) ^ -1));
EVALUATE(((x >> 31) ^ -1) + 1);
EVALUATE(!x);
EVALUATE(!x ^ 1);
EVALUATE((((x >> 31) ^ -1) + 1) | (!x ^ 1));
return 0;
}
>> will generally do arithmetic shift on signed data, so ((-1) >> 31) == (-1), contrary to your assumption. As pointed out by others, this is not guaranteed by the standard, but it is most likely true on modern systems. In any case, be careful with this type of bit twiddling. If portability is a concern or speed is not, you should do it a different way. See Is there a standard sign function (signum, sgn) in C/C++? for some ideas.

How to sign extend a 9-bit value when converting from an 8-bit value?

I'm implementing a relative branching function in my simple VM.
Basically, I'm given an 8-bit relative value. I then shift this left by 1 bit to make it a 9-bit value. So, for instance, if you were to say "branch +127" this would really mean, 127 instructions, and thus would add 256 to the IP.
My current code looks like this:
uint8_t argument = 0xFF; //-1 or whatever
int16_t difference = argument << 1;
*ip += difference; //ip is a uint16_t
I don't believe difference will ever be detected as a less than 0 with this however. I'm rusty on how signed to unsigned works. Beyond that, I'm not sure the difference would be correctly be subtracted from IP in the case argument is say -1 or -2 or something.
Basically, I'm wanting something that would satisfy these "tests"
//case 1
argument = -5
difference -> -10
ip = 20 -> 10 //ip starts at 20, but becomes 10 after applying difference
//case 2
argument = 127 (must fit in a byte)
difference -> 254
ip = 20 -> 274
Hopefully that makes it a bit more clear.
Anyway, how would I do this cheaply? I saw one "solution" to a similar problem, but it involved division. I'm working with slow embedded processors (assumed to be without efficient ways to multiply and divide), so that's a pretty big thing I'd like to avoid.
To clarify: you worry that left shifting a negative 8 bit number will make it appear like a positive nine bit number? Just pad the top 9 bits with the sign bit of the initial number before left shift:
diff = 0xFF;
int16 diff16=(diff + (diff & 0x80)*0x01FE) << 1;
Now your diff16 is signed 2*diff
As was pointed out by Richard J Ross III, you can avoid the multiplication (if that's expensive on your platform) with a conditional branch:
int16 diff16 = (diff + ((diff & 0x80)?0xFF00:0))<<1;
If you are worried about things staying in range and such ("undefined behavior"), you can do
int16 diff16 = diff;
diff16 = (diff16 | ((diff16 & 0x80)?0x7F00:0))<<1;
At no point does this produce numbers that are going out of range.
The cleanest solution, though, seems to be "cast and shift":
diff16 = (signed char)diff; // recognizes and preserves the sign of diff
diff16 = (short int)((unsigned short)diff16)<<1; // left shift, preserving sign
This produces the expected result, because the compiler automatically takes care of the sign bit (so no need for the mask) in the first line; and in the second line, it does a left shift on an unsigned int (for which overflow is well defined per the standard); the final cast back to short int ensures that the number is correctly interpreted as negative. I believe that in this form the construct is never "undefined".
All of my quotes come from the C standard, section 6.3.1.3. Unsigned to signed is well defined when the value is within range of the signed type:
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
Signed to unsigned is well defined:
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Unsigned to signed, when the value lies out of range isn't too well defined:
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
Unfortunately, your question lies in the realm of point 3. C doesn't guarantee any implicit mechanism to convert out-of-range values, so you'll need to explicitly provide one. The first step is to decide which representation you intend to use: Ones' complement, two's complement or sign and magnitude
The representation you use will affect the translation algorithm you use. In the example below, I'll use two's complement: If the sign bit is 1 and the value bits are all 0, this corresponds to your lowest value. Your lowest value is another choice you must make: In the case of two's complement, it'd make sense to use either of INT16_MIN (-32768) or INT8_MIN (-128). In the case of the other two, it'd make sense to use INT16_MIN - 1 or INT8_MIN - 1 due to the presense of negative zeros, which should probably be translated to be indistinguishable from regular zeros. In this example, I'll use INT8_MIN, since it makes sense that (uint8_t) -1 should translate to -1 as an int16_t.
Separate the sign bit from the value bits. The value should be the absolute value, except in the case of a two's complement minimum value when sign will be 1 and the value will be 0. Of course, the sign bit can be where-ever you like it to be, though it's conventional for it to rest at the far left hand side. Hence, shifting right 7 places obtains the conventional "sign" bit:
uint8_t sign = input >> 7;
uint8_t value = input & (UINT8_MAX >> 1);
int16_t result;
If the sign bit is 1, we'll call this a negative number and add to INT8_MIN to construct the sign so we don't end up in the same conundrum we started with, or worse: undefined behaviour (which is the fate of one of the other answers).
if (sign == 1) {
result = INT8_MIN + value;
}
else {
result = value;
}
This can be shortened to:
int16_t result = (input >> 7) ? INT8_MIN + (input & (UINT8_MAX >> 1)) : input;
... or, better yet:
int16_t result = input <= INT8_MAX ? input
: INT8_MIN + (int8_t)(input % (uint8_t) INT8_MIN);
The sign test now involves checking if it's in the positive range. If it is, the value remains unchanged. Otherwise, we use addition and modulo to produce the correct negative value. This is fairly consistent with the C standard's language above. It works well for two's complement, because int16_t and int8_t are guaranteed to use a two's complement representation internally. However, types like int aren't required to use a two's complement representation internally. When converting unsigned int to int for example, there needs to be another check, so that we're treating values less than or equal to INT_MAX as positive, and values greater than or equal to (unsigned int) INT_MIN as negative. Any other values need to be handled as errors; In this case I treat them as zeros.
/* Generate some random input */
srand(time(NULL));
unsigned int input = rand();
for (unsigned int x = UINT_MAX / ((unsigned int) RAND_MAX + 1); x > 1; x--) {
input *= (unsigned int) RAND_MAX + 1;
input += rand();
}
int result = /* Handle positives: */ input <= INT_MAX ? input
: /* Handle negatives: */ input >= (unsigned int) INT_MIN ? INT_MIN + (int)(input % (unsigned int) INT_MIN)
: /* Handle errors: */ 0;
If the offset is in the 2's complement representation, then
convert this
uint8_t argument = 0xFF; //-1
int16_t difference = argument << 1;
*ip += difference;
into this:
uint8_t argument = 0xFF; //-1
int8_t signed_argument;
signed_argument = argument; // this relies on implementation-defined
// conversion of unsigned to signed, usually it's
// just a bit-wise copy on 2's complement systems
// OR
// memcpy(&signed_argument, &argument, sizeof argument);
*ip += signed_argument + signed_argument;

How can I check if a signed integer is positive?

Using bitwise operators and I suppose addition and subtraction, how can I check if a signed integer is positive (specifically, not negative and not zero)? I'm sure the answer to this is very simple, but it's just not coming to me.
If you really want an "is strictly positive" predicate for int n without using conditionals (assuming 2's complement):
-n will have the sign (top) bit set if n was strictly positive, and clear in all other cases except n == INT_MIN;
~n will have the sign bit set if n was strictly positive, or 0, and clear in all other cases including n == INT_MIN;
...so -n & ~n will have the sign bit set if n was strictly positive, and clear in all other cases.
Apply an unsigned shift to turn this into a 0 / 1 answer:
int strictly_positive = (unsigned)(-n & ~n) >> ((sizeof(int) * CHAR_BIT) - 1);
EDIT: as caf points out in the comments, -n causes an overflow when n == INT_MIN (still assuming 2's complement). The C standard allows the program to fail in this case (for example, you can enable traps for signed overflow using GCC with the-ftrapv option). Casting n to unsigned fixes the problem (unsigned arithmetic does not cause overflows). So an improvement would be:
unsigned u = (unsigned)n;
int strictly_positive = (-u & ~u) >> ((sizeof(int) * CHAR_BIT) - 1);
Check the most significant bit. 0 is positive, 1 is negative.
If you can't use the obvious comparison operators, then you have to work harder:
int i = anyValue;
if (i && !(i & (1U << (sizeof(int) * CHAR_BIT - 1))))
/* I'm almost positive it is positive */
The first term checks that the value is not zero; the second checks that the value does not have the leading bit set. That should work for 2's-complement, 1's-complement or sign-magnitude integers.
Consider how the signedness is represented. Often it's done with two's-complement or with a simple sign bit - I think both of these could be checked with a simple logical and.
Check that is not 0 and the most significant bit is 0, something like:
int positive(int x) {
return x && (x & 0x80000000);
}

Resources