unsigned integer wraparound handling in C - c

To handle unsigned integer wraparound issues, I am planning to use the following checks.
Addition:
Performing a postcondition check to ensure that the result of the unsigned addition operation is not less than the first value
void fn(unsigned int a, unsigned int b)
{
unsigned int sum = a + b;
if (sum < a) {
// Wraparound occured. return error.
}
/* ... */
}
Subtraction: Performing a postcondition test that the result of the unsigned subtraction operation is not greater than the first value:
void fn(unsigned int a, unsigned int b)
{
unsigned int difference = a - b;
if (difference > a) {
// Wraparound occured. return error.
}
/* ... */
}
I assume these checks will works irrespective of compilers. Is there any better way to handle unsigned integer wraparound issues? I am looking for cross platform support.

An alternative approach for checking whether z=x+y will yield an arithmetically-correct value is to compare x with (targetType)~(promotedType)y, where (targetType) is the type of the unsigned target container, and (promotedType) is the type with which the addition would be performed. If x exceeds (targetType)~y, then the result will not fit in the target type.
To understand the casts, consider that if y is 1, then the maximum value for x should be one less than the smaller of the maximum value for the type that will receive x+y, or the type with which x+y would be evaluated. The inner promotion is needed to accommodate the case where x is larger than y, and the outer one is needed to accommodate the case where z is smaller than the type used for the addition.

Related

How to convert uint to int in C with minimal loss of result range

I want the difference between two unbounded integers, each represented by a uint32_t value which is the unbounded integer taken modulo 2^32. As in, for example, TCP sequence numbers. Note that the modulo 2^32 representation can wrap around 0, unlike more restricted questions that do not allow wrapping around 0.
Assume that the difference between the underlying unbounded integers are in the range of a normal int. I want this signed difference value. In other words, return a value within the normal int range that is equivalent to the difference of the two uint32_t inputs modulo 2^32.
For example, 0 - 0xffffffff = 1 because we assume that the underlying unbounded integers are in int range. Proof: if A mod 2^32 = 0 and B mod 2^32 = 0xffffffff, then (A=0, B=-1) (mod 2^32) and therefore (A-B=1) (mod 2^32) and in the int range this modulo class has the single representative 1.
I have used the following code:
static inline int sub_tcp_sn(uint32_t a, uint32_t b)
{
uint32_t delta = a - b;
// this would work on most systems
return delta;
// what is the language-safe way to do this?
}
This works on most systems because they use modulo-2^32 representations for both uint and int, and a normal modulo-2^32 subtraction is the only reasonable assembly code to generate here.
However, I believe that the C standard only defines the result of the above code if delta>=0. For example on this question one answer says:
If we assign an out-of-range value to an object of signed type, the
result is undefined. The program might appear to work, it might crash,
or it might produce garbage values.
How should a modulo-2^32 conversion from uint to int be done according to the C standard?
Note: I would prefer the answer code not to involve conditional expressions, unless you can prove it's required. (case analysis in the explanation of the code is OK).
There must be a standard function that does this... but in the meantime:
#include <stdint.h> // uint32_t
#include <limits.h> // INT_MAX
#include <assert.h> // assert
static inline int sub_tcp_sn(uint32_t a, uint32_t b)
{
uint32_t delta = a - b;
return delta <= INT_MAX ? delta : -(int)~delta - 1;
}
Note that it is UB in the case that the result is not representable, but the question said that was OK.
If the system has a 64-bit long long type, then the range can easily be customized and checked as well:
typedef long long sint64_t;
static inline sint64_t sub_tcp_sn_custom_range(uint32_t a, uint32_t b,
sint64_t out_min, sint64_t out_max)
{
assert(sizeof(sint64_t) == 8);
uint32_t delta = a - b;
sint64_t result = delta <= out_max ? delta : -(sint64_t)-delta;
assert(result >= out_min && result <= out_max);
return result;
}
For example, sub_tcp_sn_custom_range(0x10000000, 0, -0xf0000000LL, 0x0fffffffLL) == -0xf00000000.
With the range customization, this solution minimizes range loss in all situations, assuming timestamps behave linearly (for example, no special meaning to wrapping around 0) and a singed 64-bit type is available.

How to properly add a negative number to a size_t

I want to support negative indexing on my list implementation, the way I want to handle this (I know there may be better way to handle negative indexing) is by converting negative indexes into their positive equivalent by adding the negative value to the total number of elements in the list.
So if I had 12 elements in my list, and I asked for the index -5, I would do 12 + (-5) = 7 so my real index that I would use to retrieve the element would be 7.
I assume some casts are all that's necessary, and I can probably try a bunch of types like ptrdiff_t and such -- but I want to learn how to determine which type is the correct choice to cast to.
// the size of the list (normally something like list->num_nodes)
size_t list_size = 12;
// the int32_t is the index argument given to an indexing function
int32_t index = -5;
// the size_t is the real index that can be passed to my internal
// indexing function that will walk from either list head or tail
// depending on whether the index is closer to start or end.
size_t real_index = 0;
// if the index is less than 0 I want to add it to the list size
// to effectively subtract, otherwise just assign it
if (index < 0) {
real_index = (list_size + index); // << warning here
} else {
real_index = (size_t)index;
}
However adding the int32_t index to the the size_t list_size causes gcc warnings:
warning: conversion to ‘long unsigned int’ from ‘int32_t {aka int}’ may change the sign of the result [-Wsign-conversion]
What is the proper way to solve the problem of adding a negative int32_t to an unsigned value like size_t? I assume it is a simple answer like casting to a larger type that handles both size_t and int32_t (int64_t? ptrdiff_t?)... But how do you determine which is the correct type to cast to (if that is the right solution)?
You can cast the int32_t to size_t and add it. The arithmetic will work properly; the result of adding a negative value that has been converted to an unsigned value will result in an unsigned value decreased by the original negative value.
Arithmetic with unsigned numbers operates modulo M, where M is one more than the maximum representable value (such as 256 for an 8-bit unsigned char whose maximum value is 255). This includes conversions. So, if we have an unsigned a and a signed and negative b, converting b to the unsigned type yields M + b (observe that, since b is negative, M + b is less than M). Then adding a is mathematically a + M + b, which, modulo M, is a + b.

get unsigned long long addition carry

I want to get the carry bit of adding two unsigned 64-bit integers in c.
I can use x86-64 asm if needed.
code:
#include <stdio.h>
typedef unsigned long long llu;
int main(void){
llu a = -1, b = -1;
int carry = /*carry of a+b*/;
llu res = a+b;
printf("a+b = %llu (because addition overflowed), carry bit = %d\n", res, carry);
return 0;
}
As #EugeneSh. observes, the carry is either 0 or 1. Moreover, given that a and b both have the same unsigned type, their sum is well defined even if the arithmetic result exceeds the range of their type. Moreover, the (C) result of the sum will be less than both a and b when overflow occurs, and greater otherwise, so we can use the fact that C relational operations evaluate to either 0 or 1 to express the carry bit as
carry = (a + b) < a;
That does not require any headers, nor does it depend on a specific upper bound, or even on a and b having the same type. As long as both have unsigned types, it reports correctly on whether the sum overflows the wider of their types or unsigned int (whichever is wider), which is the same as their sum setting the carry bit. As a bonus, it is expressed in terms of the sum itself, which I think makes it clear what's being tested.
Carry can be only 0 or 1. 1 if there was a wrapping-around and 0 otherwise.
The wrapping-around is happening in case a + b > ULONG_LONG_MAX is true . Note, this is in mathematical terms, not in terms of C, as if a + b is actually overflowing, then this will not work. Instead you want to rearrange it to be a > ULONG_LONG_MAX - b. So the value of carry will be:
carry = a > ULONG_LONG_MAX - b ? 1 : 0;
or any preferred style equivalent.
Don't forget to include limits.h.

Integer overflow vs implicit conversion from long long to int

Take for example int a=INT_MAX-1; and int b=INT_MAX-1; and assume that int is 32-bit and a function
int product(int a,int b)
{
return a*b;
}
Now here the product a*b overflows resulting in undefined behavior from the standard:
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
However if we have instead
int product(int a,int b)
{
long long x = (long long)a*b;
return x;
}
Then assuming this answer is correct and applies to long long as well by the standard the result is implementation-defined.
I'm thinking that undefined behavior can cause anything including a crash so it's better to avoid it all costs, hence that the second version is preferable. But I'm not quite sure if my reasoning is okay.
Question: Is second version preferable or is the first one or are they equally preferable?
Both of the options are bad because they do not produce the desired result. IMHO it is a moot point trying to rank them in badness order.
My advice would be to fix the function to be well-defined for all use cases.
If you (the programmer) will never (ever!) pass values to the product() function that will cause undefined behavior, then the first version, why not.
The second version returns the sizeof(int)*CHAR_BIT least significant bits of the result (this is implementation defined behavior) and still may overflow on architectures where LLONG_MAX == INT_MAX. The second version may take ages to execute on a 8-bit processor with real bad support for long long multiplication and maybe you should handle the overflow when converting long long to int with some if (x > INT_MAX) return INT_MAX;, unless you are only really interested in only the least significant bits of the product result.
The preferable version is that, where no undefined behavior exists. If you aren't sure if multiplication a and b will result in undefined behavior or not, you should check if it will and prepare for such a case.
#include <assert.h>
#include <limits.h>
int product(int a, int b)
{
assert(a < INT_MAX/b && b < INT_MAX/a);
if (!(a < INT_MAX/b && b < INT_MAX/a))
return INT_MAX;
return a * b;
}
or in GNUC:
int product(int a, int b) {
int c;
if (__builtin_sadd_overflow(a, b, &c)) {
assert(0);
return INT_MAX;
}
return c;
}
I believe that slightly tweaked second version might be interesting for you:
int product(int a, int b)
{
long long x = (long long)a * b;
if (x < INT_MIN || x > INT_MAX)
{
fprintf(stderr, "Error in product(): Result out of range of int\n");
abort();
}
return x;
}
This function takes two integers as long ints, computes their product and checks if
the result is in range of int. If it is, we can return it from the function without any bad consequences. If it is not, we can print error message and abort, or do exception handling of a different kind.
EDIT 1: But this code stil expects that (long long)a * b does not overflow, which is not guaranteed when i. e. sizeof(long long) == sizeof(int). In such case, an overflow check should be added to make sure this does not happen. The (6.54) Integer Overflow Builtins could be interesting for you if you don't mind using GCC-dependent code. If you want to stay in C without any extensions, there are methods to detect multiplication overflow as well, see this StackOverflow answer: https://stackoverflow.com/a/1815371/1003701

Why am I unable to store my data in long data type?

int power(int first,int second) {
int counter1 = 0;
long ret = 1;
while (counter1 != second){
ret *= first;
counter1 += 1;
}
return ret;
}
int main(int argc,char **argv) {
long one = atol(argv[1]);
long two = atol(argv[2]);
char word[30];
long finally;
printf("What is the operation? 'power','factorial' or 'recfactorial'\n");
scanf("%20s",word);
if (strcmp("power",word) == 0){
finally = power(one,two);
printf("%ld\n",finally);
return 0;
}
}
This function is intended to do the "power of" operation like on the calculator, so if I write: ./a.out 5 3 it will give me 5 to the power of 3 and print out 125
The problem is, in cases where the numbers are like: ./a.out 20 10, 20 to the power of 10, I expect to see the result of: 1.024 x 10^13, but it instead outputs 797966336.
What is the cause of the current output I am getting?
Note: I assume that this has something to do with the atol() and long data types. Are these not big enough to store the information? If not, any idea how to make it run for bigger numbers?
Sure, your inputs are long, but your power function takes and returns int! Apparently, that's 32-bit on your system … so, on your system, 1.024×1013 is more than int can handle.
Make sure that you pick a type that's big enough for your data, and use it consistently. Even long may not be enough — check your system!
First and foremost, you need to change the return type and input parameter types of power() from int to long. Otherwise, on a system where long and int are having different size,
The input arguments may get truncated to int while you're passing long.
The returned value will be casted to int before returning, which can truncate the actual value.
After that, 1.024×1013 (10240000000000) cannot be held by an int or long (if 32 bits). You need to use a data type having more width, like long long.
one and two are long.
long one = atol(argv[1]);
long two = atol(argv[2]);
You call this function with them
int power(int first, int second);
But your function takes int, there is an implicit conversion here, and return int. So now, your long are int, that cause an undefined behaviour (see comments).
Quick answer:
The values of your power function get implicitly converted.
Change the function parameters to type other then int that can hold larger values, one possible type would be long.
The input value gets type converted and truncated to match the parameters of your function.
The result of the computation in the body of the function will be again converted to match the return type, in your case int: not able to handle the size of the values.
Note1: as noted by the more experienced members, there is a machine-specific issue, which is that your int type is not handling the usual size int is supposed to handle.
1. To make the answer complete
Code is mixing int, long and hoping for an answer the exceeds long range.
The answer is simply the result of trying to put 10 pounds of potatoes in a 5-pound sack.
... idea how to make it run for bigger numbers.
Use the widest integer available. Examples: uintmax_t, unsigned long long.
With C99 onward, normally the greatest representable integer will be UINTMAX_MAX.
#include <stdint.h>
uintmax_t power_a(long first, long second) {
long counter1 = 0;
uintmax_t ret = 1;
while (counter1 != second){ // number of iterations could be in the billions
ret *= first;
counter1 += 1;
}
return ret;
}
But let us avoid problematic behavior with negative numbers and improve the efficiency of the calculation from liner to exponential.
// return x raised to the y power
uintmax_t pow_jululu(unsigned long x, unsigned long y) {
uintmax_t z = 1;
uintmax_t base = x;
while (y) { // max number of iterations would bit width: e.g. 64
if (y & 1) {
z *= base;
}
y >>= 1;
base *= base;
}
return z;
}
int main(int argc,char **argv) {
assert(argc >= 3);
unsigned long one = strtoul(argv[1], 0, 10);
unsigned long two = strtoul(argv[2], 0, 10);
uintmax_t finally = pow_jululu(one,two);
printf("%ju\n",finally);
return 0;
}
This approach has limits too. 1) z *= base can mathematically overflow for calls like pow_jululu(2, 1000). 2) base*base may mathematically overflow in the uncommon situation where unsigned long is more than half the width of uintmax_t. 3) some other nuances too.
Resort to other types e.g.: long double, Arbitrary-precision arithmetic. This is likely beyond the scope of this simple task.
You could use a long long which is 8 bytes in length instead of the 4 byte length of long and int.
long long will provide you values between –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. This I think should just about cover every value you may encounter just now.

Resources