get unsigned long long addition carry - c

I want to get the carry bit of adding two unsigned 64-bit integers in c.
I can use x86-64 asm if needed.
code:
#include <stdio.h>
typedef unsigned long long llu;
int main(void){
llu a = -1, b = -1;
int carry = /*carry of a+b*/;
llu res = a+b;
printf("a+b = %llu (because addition overflowed), carry bit = %d\n", res, carry);
return 0;
}

As #EugeneSh. observes, the carry is either 0 or 1. Moreover, given that a and b both have the same unsigned type, their sum is well defined even if the arithmetic result exceeds the range of their type. Moreover, the (C) result of the sum will be less than both a and b when overflow occurs, and greater otherwise, so we can use the fact that C relational operations evaluate to either 0 or 1 to express the carry bit as
carry = (a + b) < a;
That does not require any headers, nor does it depend on a specific upper bound, or even on a and b having the same type. As long as both have unsigned types, it reports correctly on whether the sum overflows the wider of their types or unsigned int (whichever is wider), which is the same as their sum setting the carry bit. As a bonus, it is expressed in terms of the sum itself, which I think makes it clear what's being tested.

Carry can be only 0 or 1. 1 if there was a wrapping-around and 0 otherwise.
The wrapping-around is happening in case a + b > ULONG_LONG_MAX is true . Note, this is in mathematical terms, not in terms of C, as if a + b is actually overflowing, then this will not work. Instead you want to rearrange it to be a > ULONG_LONG_MAX - b. So the value of carry will be:
carry = a > ULONG_LONG_MAX - b ? 1 : 0;
or any preferred style equivalent.
Don't forget to include limits.h.

Related

How to convert uint to int in C with minimal loss of result range

I want the difference between two unbounded integers, each represented by a uint32_t value which is the unbounded integer taken modulo 2^32. As in, for example, TCP sequence numbers. Note that the modulo 2^32 representation can wrap around 0, unlike more restricted questions that do not allow wrapping around 0.
Assume that the difference between the underlying unbounded integers are in the range of a normal int. I want this signed difference value. In other words, return a value within the normal int range that is equivalent to the difference of the two uint32_t inputs modulo 2^32.
For example, 0 - 0xffffffff = 1 because we assume that the underlying unbounded integers are in int range. Proof: if A mod 2^32 = 0 and B mod 2^32 = 0xffffffff, then (A=0, B=-1) (mod 2^32) and therefore (A-B=1) (mod 2^32) and in the int range this modulo class has the single representative 1.
I have used the following code:
static inline int sub_tcp_sn(uint32_t a, uint32_t b)
{
uint32_t delta = a - b;
// this would work on most systems
return delta;
// what is the language-safe way to do this?
}
This works on most systems because they use modulo-2^32 representations for both uint and int, and a normal modulo-2^32 subtraction is the only reasonable assembly code to generate here.
However, I believe that the C standard only defines the result of the above code if delta>=0. For example on this question one answer says:
If we assign an out-of-range value to an object of signed type, the
result is undefined. The program might appear to work, it might crash,
or it might produce garbage values.
How should a modulo-2^32 conversion from uint to int be done according to the C standard?
Note: I would prefer the answer code not to involve conditional expressions, unless you can prove it's required. (case analysis in the explanation of the code is OK).
There must be a standard function that does this... but in the meantime:
#include <stdint.h> // uint32_t
#include <limits.h> // INT_MAX
#include <assert.h> // assert
static inline int sub_tcp_sn(uint32_t a, uint32_t b)
{
uint32_t delta = a - b;
return delta <= INT_MAX ? delta : -(int)~delta - 1;
}
Note that it is UB in the case that the result is not representable, but the question said that was OK.
If the system has a 64-bit long long type, then the range can easily be customized and checked as well:
typedef long long sint64_t;
static inline sint64_t sub_tcp_sn_custom_range(uint32_t a, uint32_t b,
sint64_t out_min, sint64_t out_max)
{
assert(sizeof(sint64_t) == 8);
uint32_t delta = a - b;
sint64_t result = delta <= out_max ? delta : -(sint64_t)-delta;
assert(result >= out_min && result <= out_max);
return result;
}
For example, sub_tcp_sn_custom_range(0x10000000, 0, -0xf0000000LL, 0x0fffffffLL) == -0xf00000000.
With the range customization, this solution minimizes range loss in all situations, assuming timestamps behave linearly (for example, no special meaning to wrapping around 0) and a singed 64-bit type is available.

Optimize integer multiplication with +/-1

Is it possible to optimize multiplication of an integer with -1/1 without using any multiplication and conditionals/branches?
Can it be done only with bitwise operations and integer addition?
Edit: The final goal is to optimize a scalar product of two integer vectors, where one of the vectors has only -1/1 values.
Most modern processors have an ALU with a fast multiplier (meaning it takes about the same time to add two numbers as to multiply them, give or take one CPU clock), so doing anything but for (i=0;i<VectorLength;++i) { p += (x[i] * y[i]) ; } isn't likely to help. However, try a simple if and see if that gives any benefits gained from the CPU's branch prediction:
for (i=0;i<VectorLength;++i) { p += (y[i]<0) ? -x[i] : x[i] ; }
In any case, if the CPU has fast multiply, doing any trick that involves more than one ALU operation (e.g., negation followed by addition, as in some of the examples given here) will more likely cause loss of performance compared to just one multiplication.
Yes, for instance, the following function returns a*b, where b is either +1 or -1:
int mul(int a, int b)
{
int c[3] = { -a, 0, +a };
return c[1+b];
}
or if both a and b are restricted to +-1:
int mul(int a, int b)
{
int c[5] = { +1, 0, -1, 0, +1 };
return c[a+b+2];
}
Yet another variant without memory access (faster than the ones above):
int mul(int a, int b)
{
return 1 - (signed)( (unsigned)(a+1) ^ (unsigned)(b+1) );
}
This answer works with any signed integer representation (sign-magnitude, ones' complement, two's complement) and does not cause any undefined behaviour.
However, I cannot guarantee that this will be faster than normal multiplication.
int Multiplication(int x, int PlusOrMinusOne)
{
PlusOrMinusOne >>= 1; //becomes 0 or -1
//optionally build 2's complement (invert all bits plus 1)
return (x ^ PlusOrMinusOne) + (PlusOrMinusOne & 1);
}
Here a nice resource for such Bit Twiddling Hacks.
Strictly speaking, no, because C allows for three different integer representations:
Signed magnitude
Ones' complement
Two’s complement
There is a proposal to strictly make it two's complement but until that makes it into the standard, you don't know what your negative numbers look like, so you can't really do too many bit hacks with them.
Less strictly speaking, most implementations use two's complement, so it is reasonably safe to use the hacks shown in other answers.
Assuming 2's compliment for integers:
int i1 = ...; // +1 or -1
int i2 = ...; // +1 or -1
unsigned u1 = i1 + 1; // 0 or 2
unsigned u2 = i2 + 1; // 0 or 2
unsigned u = u1 + u2; // 0 or 2 or 4: 0 and 4 need to become 1, 2 needs to become -1
u = (u & ~4); // 0 is 1 and 2 is -1
int i = u - 1; // -1 is 1 and 1 is -1
i = ~i + 1; // -1 is -1 and 1 is 1 :-)
The same as one-liner:
int i = ~(((i1 + i2 + 2) & ~4) - 1) + 1;
The following is true for i1 and i2 being either +1 or -1:
i == i1 * i2

Finding maximum value of a short int variable in C

I was working on Exercise 2-1 of K&R, the goal is to calculate the range of different variable types, bellow is my function to calculate the maximum value a short int can contain:
short int max_short(void) {
short int i = 1, j = 0, k = 0;
while (i > k) {
k = i;
if (((short int)2 * i) > (short int)0)
i *= 2;
else {
j = i;
while (i + j <= (short int)0)
j /= 2;
i += j;
}
}
return i;
}
My problem is that the returned value by this function is: -32768 which is obviously wrong since I'm expecting a positive value. I can't figure out where the problem is, I used the same function (with changes in the variables types) to calculate the maximum value an int can contain and it worked...
I though the problem could be caused by comparison inside the if and while statements, hence the typecasting but that didn't help...
Any ideas what is causing this ? Thanks in advance!
EDIT: Thanks to Antti Haapala for his explanations, the overflow to the sign bit results in undefined behavior NOT in negative values.
You can't use calculations like this to deduce the range of signed integers, because signed integer overflow has undefined behaviour, and narrowing conversion at best results in an implementation-defined value, or a signal being raised. The proper solution is to just use SHRT_MAX, INT_MAX ... of <limits.h>. Deducing the maximum value of signed integers via arithmetic is a trick question in standardized C language, and has been so ever since the first standard was published in 1989.
Note that the original edition of K&R predates the standardization of C by 11 years, and even the 2nd one - the "ANSI-C" version predates the finalized standard and differs from it somewhat - they were written for a language that wasn't almost, but not quite, entirely unlike the C language of this day.
You can do it easily for unsigned integers though:
unsigned int i = -1;
// i now holds the maximum value of `unsigned int`.
Per definition, you cannot calculate the maximum value of a type in C, by using variables of that very same type. It simply doesn't make any sense. The type will overflow when it goes "over the top". In case of signed integer overflow, the behavior is undefined, meaning you will get a major bug if you attempt it.
The correct way to do this is to simply check SHRT_MAX from limits.h.
An alternative, somewhat more questionable way would be to create the maximum of an unsigned short and then divide that by 2. We can create the maximum by taking the bitwise inversion of the value 0.
#include <stdio.h>
#include <limits.h>
int main()
{
printf("%hd\n", SHRT_MAX); // best way
unsigned short ushort_max = ~0u;
short short_max = ushort_max / 2;
printf("%hd\n", short_max);
return 0;
}
One note about your code:
Casts such as ((short int)2*i)>(short int)0 are completely superfluous. Most binary operators in C such as * and > implement something called "the usual arithmetic conversions", which is a way to implicitly convert and balance types of an expression. These implicit conversion rules will silently make both of the operands type int despite your casts.
You forgot to cast to short int during comparison
OK, here I assume that the computer would handle integer overflow behavior by changing into negative integers, as I believe that you have assumed in writing this program.
code that outputs 32767:
#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>
short int max_short(void)
{
short int i = 1, j = 0, k = 0;
while (i>k)
{
k = i;
if (((short int)(2 * i))>(short int)0)
i *= 2;
else
{
j = i;
while ((short int)(i + j) <= (short int)0)
j /= 2;
i += j;
}
}
return i;
}
int main() {
printf("%d", max_short());
while (1);
}
added 2 casts

In C bits, multiply by 3 and divide by 16

A buddy of mine had these puzzles and this is one that is eluding me. Here is the problem, you are given a number and you want to return that number times 3 and divided by 16 rounding towards 0. Should be easy. The catch? You can only use the ! ~ & ^ | + << >> operators and of them only a combination of 12.
int mult(int x){
//some code here...
return y;
}
My attempt at it has been:
int hold = x + x + x;
int hold1 = 8;
hold1 = hold1 & hold;
hold1 = hold1 >> 3;
hold = hold >> 4;
hold = hold + hold1;
return hold;
But that doesn't seem to be working. I think I have a problem of losing bits but I can't seem to come up with a way of saving them. Another perspective would be nice. Just to add, you also can only use variables of type int and no loops, if statements or function calls may be used.
Right now I have the number 0xfffffff. It is supposed to return 0x2ffffff but it is returning 0x3000000.
For this question you need to worry about the lost bits before your division (obviously).
Essentially, if it is negative then you want to add 15 after you multiply by 3. A simple if statement (using your operators) should suffice.
I am not going to give you the code but a step by step would look like,
x = x*3
get the sign and store it in variable foo.
have another variable hold x + 15;
Set up an if statement so that if x is negative it uses that added 15 and if not then it uses the regular number (times 3 which we did above).
Then divide by 16 which you already showed you know how to do. Good luck!
This seems to work (as long as no overflow occurs):
((num<<2)+~num+1)>>4
Try this JavaScript code, run in console:
for (var num = -128; num <= 128; ++num) {
var a = Math.floor(num * 3 / 16);
var b = ((num<<2)+~num+1)>>4;
console.log(
"Input:", num,
"Regular math:", a,
"Bit math:", b,
"Equal: ", a===b
);
}
The Maths
When you divide a positive integer n by 16, you get a positive integer quotient k and a remainder c < 16:
(n/16) = k + (c/16).
(Or simply apply the Euclidan algorithm.) The question asks for multiplication by 3/16, so multiply by 3
(n/16) * 3 = 3k + (c/16) * 3.
The number k is an integer, so the part 3k is still a whole number. However, int arithmetic rounds down, so the second term may lose precision if you divide first, And since c < 16, you can safely multiply first without overflowing (assuming sizeof(int) >= 7). So the algorithm design can be
(3n/16) = 3k + (3c/16).
The design
The integer k is simply n/16 rounded down towards 0. So k can be found by applying a single AND operation. Two further operations will give 3k. Operation count: 3.
The remainder c can also be found using an AND operation (with the missing bits). Multiplication by 3 uses two more operations. And shifts finishes the division. Operation count: 4.
Add them together gives you the final answer.
Total operation count: 8.
Negatives
The above algorithm uses shift operations. It may not work well on negatives. However, assuming two's complement, the sign of n is stored in a sign bit. It can be removed beforing applying the algorithm and reapplied on the answer.
To find and store the sign of n, a single AND is sufficient.
To remove this sign, OR can be used.
Apply the above algorithm.
To restore the sign bit, Use a final OR operation on the algorithm output with the stored sign bit.
This brings the final operation count up to 11.
what you can do is first divide by 4 then add 3 times then again devide by 4.
3*x/16=(x/4+x/4+x/4)/4
with this logic the program can be
main()
{
int x=0xefffffff;
int y;
printf("%x",x);
y=x&(0x80000000);
y=y>>31;
x=(y&(~x+1))+(~y&(x));
x=x>>2;
x=x&(0x3fffffff);
x=x+x+x;
x=x>>2;
x=x&(0x3fffffff);
x=(y&(~x+1))+(~y&(x));
printf("\n%x %d",x,x);
}
AND with 0x3fffffff to make msb's zero. it'l even convert numbers to positive.
This uses 2's complement of negative numbers. with direct methods to divide there will be loss of bit accuracy for negative numbers. so use this work arround of converting -ve to +ve number then perform division operations.
Note that the C99 standard states in section section 6.5.7 that right shifts of signed negative integer invokes implementation-defined behavior. Under the provisions that int is comprised of 32 bits and that right shifting of signed integers maps to an arithmetic shift instruction, the following code works for all int inputs. A fully portable solution that also fulfills the requirements set out in the question may be possible, but I cannot think of one right now.
My basic idea is to split the number into high and low bits to prevent intermediate overflow. The high bits are divided by 16 first (this is an exact operation), then multiplied by three. The low bits are first multiplied by three, then divided by 16. Since arithmetic right shift rounds towards negative infinity instead of towards zero like integer division, a correction needs to be applied to the right shift for negative numbers. For a right shift by N, one needs to add 2N-1 prior to the shift if the number to be shifted is negative.
#include <stdio.h>
#include <stdlib.h>
int ref (int a)
{
long long int t = ((long long int)a * 3) / 16;
return (int)t;
}
int main (void)
{
int a, t, r, c, res;
a = 0;
do {
t = a >> 4; /* high order bits */
r = a & 0xf; /* low order bits */
c = (a >> 31) & 15; /* shift correction. Portable alternative: (a < 0) ? 15 : 0 */
res = t + t + t + ((r + r + r + c) >> 4);
if (res != ref(a)) {
printf ("!!!! error a=%08x res=%08x ref=%08x\n", a, res, ref(a));
return EXIT_FAILURE;
}
a++;
} while (a);
return EXIT_SUCCESS;
}

handling large numbers and overflows

I am given an array of N elements and I need to find the index P within this array where
sum of values in the rage 0 to P is equal to sum of values in the range P+1 to N-1.
The values of each element in the array can range to -2147483648 to 2147483647 and
N can be max 10000000.
Given this how do I ensure there is no overflow when adding each values to find the index P ?
To insure no overflow, use int32_t and int64_t.
The range of values [-2147483648 ... 2147483647] matches the int32_t range. You could also use int64_t for this, but an array of 10000000 deserves space considerations.
As the sum of any 10,000,000 values does not exceed the range of int64_t, perform all your additions using int64_t.
#include <stdint.h>
size_t foo(const int32_t *value, size_t N) {
int64_t sum = 0;
...
sum += value[i];
...
}
BTW: Confident that a solution can be had that does not require addition 64-bit addition.
[Edit] Failed to derive simple int32_t only solution, but came up with:
size_t HalfSum(const int32_t *value, size_t N) {
// find sum of entire array
int64_t ArraySum = 0;
size_t P;
for (P = 0; P < N; P++) {
ArraySum += value[P];
}
// compute sum again, stopping when it is half of total
int64_t PartialSum = 0;
for (P = 0; P < N; P++) {
PartialSum += value[P];
if ((PartialSum * 2) == ArraySum) {
return P;
}
}
return N; // No solution (normally P should be 0 ... N-1)
}
Use 64 bit integers for your calculations. The best type to use is int64_t since long is no guaranteed to be 64 bits (you have to #include <stdint.h> to make it available).
Edit: Pascal Cuoq is right: long long does provide the 64-bit guarantee as well and doesn't need an include (it can be longer than 64 bits, though), so it's just the long type that you have to avoid if you want to be portable.
In the worst case scenario, P+1 = N-1. Since the max value of an ynumber can only be 2147483647 or -2147483647 for any single number, It means that in the worst cases possible, P would be the max or min of long. In the other cases, P will still be a long integer. Because of this, you should only need to use a long in the worst case scenario (since if your worse case expected outcome is that P is the largest possible number that any single number can be is a long.
To make sure you don't have to use anything larger, pair up negative values with postive ones so that you stay below the overflow of a long.
Imagine we have 3 numbers, a b and c. If a + b overflows the long datatype, we know that c will not be P.
Now imagine that we have 4 numbers, a, b, c, d such that a + b + c = d (meaning d is P), if a + b would overflow long, it means
1) c cannot be P
2) there is a combination of a + b + c such that the data type of long would not need to be overflowed.
For instance, a is max long, b is max long, c is min long, and d is 0, then a + c + b = d would be the correct combination of operations in order to not use a data type larger than long, and we can try a + c because we know c cannot be P since a + b would overflow long > maximum possible value of P.

Resources