Long to int truncation problem, truncating exceptions or handling error - c

Leetcode requires that the output of -91283472332 be converted to int, and the output result is -2147483648. I use long to store the result, and then return int. Why is the result returned -1089159116
here's my code
int myAtoi(char * s){
char *str = s;
long n = 0;
char *flag ;
while(*str){
if( * str =='-' || * str == '+')
{
flag = str++;
continue;
}
if(*str<='9' && *str>='0')
{
n*=10;
n+=(*str++)-48;
continue;
}
if(*str>='A'&&*str<='z')
break;
++str;
}
if(*flag == '-')
{
n-=(2*n);
}
return n;
}
So here's the description
Input: s = "-91283472332"
Output: -2147483648
Explanation:
Step 1:
"-91283472332" (no characters read because there is no leading whitespace)
^
Step 2:
"-91283472332" ('-' is read, so the result should be negative)
^
Step 3:
"-91283472332" ("91283472332" is read in)
^
The parsed integer is -91283472332.
Since -91283472332 is less than the lower bound of the range [-231, 231 - 1], the final result is clamped to -231 = -2147483648.

The value -91283472332 is 0xFFFFFFEABF14C034 in hexadecimal, two's complement.
When it is trunctated to 32-bit long, the value is 0xBF14C034 and it means -1089159116 when interpreted as two's complement.
You should add some conditional branch to return -2147483648 when the value exceeds the limit.

I guess you're doing this problem. Since it requires values outside the range to be clamped to the maximum values, A.K.A saturated math, you'll need to check the value's range like this
if (n > INT_MAX)
return INT_MAX;
else if (n < INT_MIN)
return INT_MINT;
else
return n;
It's similar to std::clamp(n, INT_MIN, INT_MAX) in C++
You can see that clearly in the requirements (emphasis mine):
If the integer is out of the 32-bit signed integer range [-231, 231 - 1], then clamp the integer so that it remains in the range. Specifically, integers less than -231 should be clamped to -231, and integers greater than 231 - 1 should be clamped to 231 - 1.
Now compare that with the above if blocks
If you cast the value from 64 to 32-bit then it'll reduce the value modulo 2n:
-91283472332 % 2147483648 = -1089159116,
or in hex: 0xFFFFFFEABF14C034 & 0xFFFFFFFF = 0xBF14C034
Saturation math is common in many areas like digital signal processing or computer graphics
is there a function in C or C++ to do "saturation" on an integer
How to do unsigned saturating addition in C?

Related

How to check that a value fits in a type without invoking undefined behaviour?

I am looking to check if a double value can be represented as an int (or the same for any pair of floating point an integer types). This is a simple way to do it:
double x = ...;
int i = x; // potentially undefined behaviour
if ((double) i != x) {
// not representable
}
However, it invokes undefined behaviour on the marked line, and triggers UBSan (which some will complain about).
Questions:
Is this method considered acceptable in general?
Is there a reasonably simple way to do it without invoking undefined behaviour?
Clarifications, as requested:
The situation I am facing right now involves conversion from double to various integer types (int, long, long long) in C. However, I have encountered similar situations before, thus I am interested in answers both for float -> integer and integer -> float conversions.
Examples of how the conversion may fail:
Float -> integer conversion may fail is the value is not a whole number, e.g. 3.5.
The source value may be out of the range of the target type (larger or small than max and min representable values). For example 1.23e100.
The source values may be +-Inf or NaN, NaN being tricky as any comparison with it returns false.
Integer -> float conversion may fail when the float type does not have enough precision. For example, typical double have 52 binary digits compared to 63 digits in a 64-bit integer type. For example, on a typical 64-bit system, (long) (double) ((1L << 53) + 1L).
I do understand that 1L << 53 (as opposed to (1L << 53) + 1) is technically exactly representable as a double, and that the code I proposed would accept this conversion, even though it probably shouldn't be allowed.
Anything I didn't think of?
Create range limits exactly as FP types
The "trick" is to form the limits without loosing precision.
Let us consider float to int.
Conversion of float to int is valid (for example with 32-bit 2's complement int) for -2,147,483,648.9999... to 2,147,483,647.9999... or nearly INT_MIN -1 to INT_MAX + 1.
We can take advantage that integer_MAX is always a power-of-2 - 1 and integer_MIN is -(power-of-2) (for common 2's complement).
Avoid the limit of FP_INT_MIN_minus_1 as it may/may not be exactly encodable as a FP.
// Form FP limits of "INT_MAX plus 1" and "INT_MIN"
#define FLOAT_INT_MAX_P1 ((INT_MAX/2 + 1)*2.0f)
#define FLOAT_INT_MIN ((float) INT_MIN)
if (f < FLOAT_INT_MAX_P1 && f - FLOAT_INT_MIN > -1.0f) {
// Within range.
Use modff() to detect a fraction if desired.
}
More pedantic code would use !isnan(f) and consider non-2's complement encoding.
Using known limits and floating-point number validity. Check what's inside limits.h header.
You can write something like this:
#include <limits.h>
#include <math.h>
// Of course, constants used are specific to "int" type... There is others for other types.
if ((isnormal(x)) && (x>=INT_MIN) && (x<=INT_MAX) && (round(x)==x))
// Safe assignation from double to int.
i = (int)x ;
else
// Handle error/overflow here.
ERROR(.....) ;
Code relies on lazy boolean evaluation, obviously.
Please refer to IEEE 754 representation of floating point numbers in Memory
https://en.wikipedia.org/wiki/IEEE_754
Take double as an example:
Sign bit: 1 bit
Exponent: 11 bits
Fraction: 52 bits
There are three special values to point out here:
If the exponent is 0 and the fractional part of the mantissa is 0, the number is ±0
If the exponent is 2047 and the fractional part of the mantissa is 0, the number is ±∞
If the exponent is 2047 and the fractional part of the mantissa is non-zero, the number is NaN.
This is an example of convert from double to int on 64-bit, just for reference
#include <stdint.h>
#define EXPBITS 11
#define FRACTIONBITS 52
#define GENMASK(n) (((uint64_t)1 << (n)) - 1)
#define EXPBIAS GENMASK(EXPBITS - 1)
#define SIGNMASK (~GENMASK(FRACTIONBITS + EXPBITS))
#define EXPMASK (GENMASK(EXPBITS) << FRACTIONBITS)
#define FRACTIONMASK GENMASK(FRACTIONBITS)
int double_to_int(double src, int *dst)
{
union {
double d;
uint64_t i;
} y;
int exp;
int sign;
int maxbits;
uint64_t fraction;
y.d = src;
sign = (y.i & SIGNMASK) ? 1 : 0;
exp = (y.i & EXPMASK) >> FRACTIONBITS;
fraction = (y.i & FRACTIONMASK);
// 0
if (fraction == 0 && exp == 0) {
*dst = 0;
return 0;
}
exp -= EXPBIAS;
// not a whole number
if (exp < 0)
return -1;
// out of the range of int
maxbits = sizeof(*dst) * 8 - 1;
if (exp >= maxbits && !(exp == maxbits && sign && fraction == 0))
return -2;
// not a whole number
if (fraction & GENMASK(FRACTIONBITS - exp))
return -3;
// convert to int
*dst = src;
return 0;
}

Why is `x - y <= x` true, when x=0x80000000, y = 1(32-bit complement)? [duplicate]

This question already has answers here:
Detecting signed overflow in C/C++
(13 answers)
Closed 1 year ago.
I want to know if x - y overflows.
Below is my code.
#include <stdio.h>
/* Determine whether arguments can be subtracted without overflow */
int tsub_ok(int x, int y)
{
return (y <= 0 && x - y >= x) || (y >= 0 && x - y <= x);
}
int main()
{
printf("0x80000000 - 1 : %d\n", tsub_ok(0x80000000, 1));
}
Why can't I get the result I expect?
You can't check for overflow of signed integers by performing the offending operation and seeing if the result wraps around.
First, the value 0x80000000 passed to the function is outside the range of a 32 bit int. So it undergoes an implementation defined conversion. On most systems that use 2's compliment, this will result in the value with that representation which is -2147483648 which also happens to be the value of INT_MIN.
Then you attempt to execute x - y which results in signed integer overflow which triggers undefined behavior, giving you an unexpected result.
The proper way to handle this is to perform some algebra to ensure the overflow does not happen.
If x and y have the same sign then subtracting won't overflow.
If the signs differ and x is positive, one might naively try this:
INT_MAX >= x - y
But this could overflow. Instead change it to the mathematically equivalent:
INT_MAX + y >= x
Because y is negative, INT_MAX + y doesn't overflow.
A similar check can be done when x is negative with INT_MIN. The full check:
if (x>=0 && y>=0) {
return 1;
} else if (x<=0 && y<=0) {
return 1;
} else if (x>=0 && INT_MAX + y >= x) {
return 1;
} else if (x<0 && INT_MIN + y <= x) {
return 1;
} else {
return 0;
}
Yes, x - y overflows.
We assume int and unsigned int are 32 bits in the C implementation you are using, as indicated in the title, and that two’s complement is used for int. Then the range of values for int is −231 to +231−1.
In tsub_ok(0x80000000, 1), the constant 0x80000000 has the value 231, and its type is unsigned int since it will not fit in an int. Then this value is passed to tsub_ok. Since the first parameter of tsub_ok has type int, the value is converted to int.
By C 2018 6.3.1.3 3, the conversion is implementation-defined. Many C implementations “wrap” the value modulo 232. Assuming your C implementation does this, the result of converting 231 to int is −231.
Then, inside the function, x - y is −231 − 1, and the result of that overflows the int type. The C standard does not define the behavior of the program when signed integer overflow occurs, and so any test that relies on comparing x - y when it may overflow is not supported by the C standard.
Here an int is 32 bits. This means it has a total range of 2^32 possible values. Converting this to hex, that's a max of 0xFFFFFFFF(when unsigned), but not signed. A signed int will have a max hex value of 0x7FFFFFFF. Thus, you cannot store 0x80000000 in an int here and have everything work.
In computer programming, signed and unsigned numbers are represented only as sequences of bits. Bit 31 is the sign bit for a 32-bit signed int, hence the highest 32-bit int you can store is 0x7FFFFFFF, hence the overflow with 0x80000000 as signed int.
Remember, a signed int is an integer that can be both positive and negative. This is as opposed to an unsigned int, which can only be used to hold a positive integer.
What you are trying to do is, you are trying a signed int variable hold an unsigned value - which causes the overflow.
For more info check Signed number representations or refer any beginner level digital number systems and programming book.

Converting a negative decimal into binary in C

I'm currently working on a program that (among others) has to convert a decimal number into binary, octal & hexadecimal.
This already works with this code:
int e = 0;
}
while(i != 0){
str[e] = (i%b) + '0';
i = i / b;
if(str[e] > '9'){
str[e] = str[e] + 7;
}
e++;
}
if(vorzeichen == -1){
str[e] = '1';
e++;
}
if(b == 16){
str[e] = 'x';
str[e+1] = '0';
}
else if(b == 8){
str[e] = '0';
}
}
b is the base (2 for binary, 8 for octal & 16 for hexa) and i is the number that i want to convert.
This gives out a string of characters which i then reverse to get the correct number. Now if i try this with negative numbers, it gives out strings not only containing 0 and 1 but also /, which is '0' -1 on the ASCII table. For octal and decimal it also gives out characters below the '/' on the ASCII table. I've attempted different possible solutions but none seemed to give the desired result. What I read on the internet is that I have to use the 2s Complement I'm stuck trying to use it. It just doesn't seem to work for me.
if you want to display a negative decimal you just can convert your int to a unsigned int like this :
unsigned int value = (unsigned int)i;
Now you only have to use value instead of i in your program and it will be fine.
Here's a good explanation of why : Converting negative decimal to binary
When converting between different bases/radixes, always work on unsigned integer types.
Let's say you have long num you wish to convert. Use an unsigned long u. To represent negative values in two's complement format, you can use
if (num < 0)
u = 1 + (~(unsigned long)(-num));
else
u = num;
or even shorter,
unsigned long u = (num < 0) ? 1 + (~(unsigned long)(-num)) : num;
This works on all architectures (except for num == LONG_MIN, in which case the above is technically undefined behaviour), even those that do not use two's complement internally, because we essentially convert the absolute value of num. If num was originally negative, we then do the two's complement to the unsigned value.
In a comment, chux suggested an alternative form which does not rely on UB for num == LONG_MIN (unless LONG_MAX == ULONG_MAX, which would be horribly odd thing to see):
unsigned long u = (num < 0) ? 1 + (~((unsigned long)(-1 - num) + 1)) : num;
This may look "uglier", but a sane C compiler should be able to optimize either one completely away on architectures with two's complement integers. chux's version avoids undefined behaviour by subtracting the negative num from -1, thus mapping -1 to 0, -2 to 1, and so on, ensuring that all negative values are representable as a nonnegative long. That value is then converted to unsigned long. This gets incremented by one, to account for the earlier -1. This procedure yields the correct negation of num.
In other words, to obtain the absolute value of a long, you can use
unsigned long abs_long(const long num)
{
return (num < 0) ? (unsigned long)(-1 - num) + 1u : (unsigned long)num;
}
% is the remainder function, not mod.
With b==2, i%b returns [-1, 0, 1]. This is not the needed functionality for str[e] = (i%b) + '0'; See ... difference between “mod” and “remainder”
This is the cause of '/' and "also gives out characters below the '/' ".
Build up the string from the "right"
With a 2's complement int, a simple approach is to convert to unsigned and avoid a negative result from %. Since code is using % to extract the least significant digit, walk the buffer from right to left.
#include <limits.h>
...
unsigned u = i;
// make a temporary buffer large enough for any string output in binary
// v------v Size of `u` in "bytes"
// | | v------v Size of a "byte" - commonly 8
char my_buff[sizeof u & CHAR_BIT + 1];
int e = 0;
// Form a pointer to the end so code assigns the least significant digits on the right
char *p = &my_buff[sizeof my_buff - 1];
// Strings are null character terminated
*p = '\0';
// Use a `do` loop to insure at least one pass. Useful when `i==0` --> "0"
do {
p--;
p[e] = "0123456789ABCDEF"[u%b]; // Select desired digit
u = u / b;
} while (u);
// "prepend" characters as desired
if(b == 16){
*(--p) = 'x';
*(--p) = '0';
}
else if(b == 8 && i != 0){
*(--p) = '0';
}
strcpy(str, p);

K&R Exercise 3-4: Negative Numbers Represented In Binary

I'm having a hard time understanding this exercise:
In a two's complement number representation, our version of itoa does not
handle the largest negative number, that is, the value of n equal to -(2^(wordsize-1)). Explain why not. Modify it to print that value correctly, regardless of the machine on which it runs.
Here is what the itoa originally looks like:
void reverse(char s[], int n)
{
int toSwap;
int end = n-1;
int begin = 0;
while(begin <= end) // Swap the array in place starting from both ends.
{
toSwap = s[begin];
s[begin] = s[end];
s[end] = toSwap;
--end;
++begin;
}
}
// Converts an integer to a character string.
void itoa(int n, char s[])
{
int i, sign;
if ((sign = n) < 0)
n = -n;
i = 0;
do
{
s[i++] = n % 10 + '0';
} while ((n /= 10) > 0);
if (sign < 0)
s[i++] = '-';
s[i] = '\0';
reverse(s, i);
}
I found this answer, but I don't understand the explanation:
http://www.stevenscs.com/programs/KR/$progs/KR-EX3-04.html
Because the absolute value of the largest negative number a word can hold is greater than that of the largest positive number, the statement early in iota that sets positive a negative number corrupts its value.
Are they saying that negative numbers contain more bits because of the sign than a positive number which has no sign? Why would multiplying by -1 affect how the large negative number is stored?
In two's complement representation, the range of values you can represent is -2n-1 to 2n-1-1. Thus, with 8 bits, you can represent values in the range -128 to 127. That's what's meant by the phrase, "the largest negative number a word can hold is greater than that of the largest positive number."
Illustrating with just 3 bits to make it clearer:Value Bits
----- ----
0 000
1 001
2 010
3 011
-4 100
-3 101
-2 110
-1 111
With 3 bits, there's no way we can represent a positive 4 in two's complement, so n = -n; won't give us the result we expect1. That's why the original atoi implementation above can't deal with INT_MIN.
Behavior on signed integer overflow is undefined, meaning that there's no fixed result.
The problem is that, if n is the largest negative number, when you do n=-n you obtain 0, bacause you cannot represent a positive number that big.
A solution can be to hold the positive number in a long integer.

Correct way to take absolute value of INT_MIN

I want to perform some arithmetic in unsigned, and need to take absolute value of negative int, something like
do_some_arithmetic_in_unsigned_mode(int some_signed_value)
{
unsigned int magnitude;
int negative;
if(some_signed_value<0) {
magnitude = 0 - some_signed_value;
negative = 1;
} else {
magnitude = some_signed_value;
negative = 0;
}
...snip...
}
But INT_MIN might be problematic, 0 - INT_MIN is UB if performed in signed arithmetic.
What is a standard/robust/safe/efficient way to do this in C?
EDIT:
If we know we are in 2-complement, maybe implicit cast and explicit bit ops would be standard? if possible, I'd like to avoid this assumption.
do_some_arithmetic_in_unsigned_mode(int some_signed_value)
{
unsigned int magnitude=some_signed_value;
int negative=some_signed_value<0;
if (negative) {
magnitude = (~magnitude) + 1;
}
...snip...
}
Conversion from signed to unsigned is well-defined: You get the corresponding representative modulo 2N. Therefore, the following will give you the correct absolute value of n:
int n = /* ... */;
unsigned int abs_n = n < 0 ? UINT_MAX - ((unsigned int)(n)) + 1U
: (unsigned int)(n);
Update: As #aka.nice suggests, we can actually replace UINT_MAX + 1U by 0U:
unsigned int abs_n = n < 0 ? -((unsigned int)(n))
: +((unsigned int)(n));
In the negative case, take some_signed_value+1. Negate it (this is safe because it can't be INT_MIN). Convert to unsigned. Then add one;
You can always test for >= -INT_MAX, this is always well defined. The only case is interesting for you is if INT_MIN < -INT_MAX and that some_signed_value == INT_MIN. You'd have to test that case separately.
I want to perform some arithmetic in unsigned, and need to take absolute value of negative int, ...
To handle pedantic cases:
The |SOME_INT_MIN|1 has some special cases:
1. Non-two's complement
Ones' complement and sign-magnitude are rarely seen these days.
SOME_INT_MIN == -SOME_INT_MAX and some_abs(some_int) is well defined. This is the easy case.
#if INT_MIN == -INT_MAX
some_abs(x); // use matching abs, labs, llabs, imaxabs
#endif
2. SOME_INT_MAX == SOME_UINT_MAX, 2's complement
C allows the max of the signed and unsigned version of an integer type to be the same. This is rarely seen these days.
2 approaches:
1) use a wider integer type, if it exist.
#if -INTMAX_MAX <= SOME_INT_MIN
imaxabs((intmax_t)x)
#endif
2) Use wide(st) floating-point (FP) type.
Conversion to a wide FP will work for SOME_INT_MIN (2's complement) as that value is a -(power-of-2). For other large negatives, the cast may lose precision for a wide integer and not so wide long double. E.g. 64-bit long long and 64-bit long double.
fabsl(x); // see restriction above.
3. SOME_INT_MAX < SOME_UINT_MAX
This is the common case well handle by #Kerrek SB's answer. The below also handles case 1.
x < 0 ? -((unsigned) x) : ((unsigned) x);
Higher Level Alternative
In cases when code is doing .... + abs(x), a well defined alternative is to subtract the negative absolute value: .... - nabs(x). Or as in abs(x) < 100, use nabs > -100.
// This is always well defined.
int nabs(int x) {
return (x < 0) x : -x;
}
1 SOME_INT implies int, long, long long or intmax_t.
static unsigned absolute(int x)
{
if (INT_MIN == x) {
/* Avoid tricky arithmetic overflow possibilities */
return ((unsigned) -(INT_MIN + 1)) + 1U;
} else if (x < 0) {
return -x;
} else {
return x;
}
}

Resources