Compiler discrepancy in expression evaluation - c

I have implemented the following function to emulate if-then-else:
int foo(int x, int y, int z) {
int negOne = (1<<31)>>31;
int test = !(~x + !x) + negOne;
int ans = (test & y) | (~test & z);
return ans;
}
I have certain restrictions, and the top part was a bit of a hack, but it works. test only evaluates to either 0 or -1.
The assignment uses a specific compiler, and in the case of x = 0, y = -2147483648, and z = 2147483647, the compiler states that my code is returning -2147483648.
That wouldn't make sense since if x = 0, then test = 0. If test = 0, then the ans expression will evaluate to z, i.e., 2147483647.
I have double-checked my outputs on two different compilers and it says I am returning 2147483647, the correct answer, so I'm left to assume this is a compiler error, perhaps related to the bounds of an integer? Unless of course, the fault is in my code.
Also additional compiler info:
The compiler is called the dlc compiler. I previously had "parse error" issues declaring my variables in the middle of my functions, and I was told the compiler was likely C89. Moving these declarations to the top of the function fixed the problem.
UPDATE:
changing the negOne expression to ~1 + 1 did not fix anything.
I evaluated both sides of the ans expressions and, as expected, they evaluated to 0 and 2147483647, respectively, so that does not seem to be the problem either. The express would then finally evaluate 0 | 2147483647, which is 2147483647; and that's what the value of ans is, and I also checked the value of the variable receiving the return value of the function, and again, it is 2147483647.
So I'm still perplexed as to why one particular compiler returns -2147483648.

You're code may be exhibiting undefined behaviour if int is 32-bits wide on the machine you're trying this. Left shifting from Standard C99 §6.5.7 (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1×2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behaviour is undefined.
The literal 1 is a signed int i.e. E1 here is an signed integer and has a non-negative value 1. 1 × 231 = 2,147,483,648 which isn't representable on a machine with 32-bit integer, since signed int range on such a machine is -2,147,483,648 to 2,147,483,647.
All bets are off when you're in the UB land, thus any output is possible. I don't understand why you can't do this:
int negOne = -1;

I think the problem what happens, when You use that specific compiler is simpe overflow. Because different compilers use different order of operations when certain order is not strictly specified, the output of this code is not deterministic. In particular case I think addition is causing an overflow.
there is described some good examples of such traps here: http://www.fefe.de/intof.html.
Sometimes these traps are unintentionally avoided with optimization made by compiler before compiling. So I would suggest to take disassembly of the output of all compilers You used - then the point of overflow comes up straight away and the differences are also made clear.
I would suggest You to use something like this for conditional emulation:
int foo(int x, int y, int z) {
int test = ~(!!x) + 1;
return (test & y) | (~test & z);
}
This should avoid overflows with representing x as "boolean" in older C standards as well.

Related

Output of C program changes when optimisation is enabled

I am solving one of the lab exercises from the CS:APP course as a self-study.
In the CS:APP course the maximum positive number, that can be represented with 4 bytes in two's complement, is marked as Tmax (which is equal to the 0x7fffffff).
Likewise, the most negative number is marked as Tmin (which is equal to the 0x80000000).
The goal of the exercise was to implement a isTmax() function which should return 1, when given an Tmax, otherwise it should return 0. This should be done only with a restricted set of operators which are: ! ~ & ^ | +, the maximum number of operators is 10.
Below you can see my implementation of isTmax() function, with comments explaining how it should work.
#include <stdio.h>
int isTmax(int x)
{
/* Ok, lets assume that x really is tMax.
* This means that if we add 1 to it we get tMin, lets call it
* possible_tmin. We can produce an actual tMin with left shift.
* We can now xor both tmins, lets call the result check.
* If inputs to xor are identical then the check will be equal to
* 0x00000000, if they are not identical then the result will be some
* value different from 0x00000000.
* As a final step we logicaly negate check to get the requested behaviour.
* */
int possible_tmin = x + 1;
int tmin = 1 << 31;
int check = possible_tmin ^ tmin;
int negated_check = !check;
printf("input =\t\t 0x%08x\n", x);
printf("possible_tmin =\t 0x%08x\n", possible_tmin);
printf("tmin =\t\t 0x%08x\n", tmin);
printf("check =\t\t 0x%08x\n", check);
printf("negated_check =\t 0x%08x\n", negated_check);
return negated_check;
}
int main()
{
printf("output: %i", isTmax(0x7fffffff));
return 0;
}
The problem that I am facing is that I get different output whether I set an optimization flag when compiling the program. I am using gcc 11.1.0.
With no optimizations I get this output, which is correct for the given input:
$ gcc main.c -lm -m32 -Wall && ./a.out
input = 0x7fffffff
possible_tmin = 0x80000000
tmin = 0x80000000
check = 0x00000000
negated_check = 0x00000001
output: 1
With optimization enabled I get this output, which is incorrect.
gcc main.c -lm -m32 -Wall -O1 && ./a.out
input = 0x7fffffff
possible_tmin = 0x80000000
tmin = 0x80000000
check = 0x00000000
negated_check = 0x00000000
output: 0
For some reason the logical negation is not applied to the check variable when optimization is enabled.
The problem persists with any other level of optimization (-O2, -O3, -Os).
Even if I write the expressions as a one-liner return !((x + 1) ^ (1 << 31)); nothing changes.
I can "force" a correct behavior If I declare check as a volatile.
I am using the same level of optimization as is used by the automated checker that came with the exercise, If I turn it off my code passes all checks.
Can anyone shed on some light why is this happening? Why doesn't the logical negation happen?
EDIT: I have added a section with the extra guidelines and restrictions connected to the exercise that I forgot to include with the original post. Specifically, I am not allowed to use any other data type instead of int. I am not sure if that also includes literal suffix U.
Replace the "return" statement in each function with one
or more lines of C code that implements the function. Your code
must conform to the following style:
int Funct(arg1, arg2, ...) {
/* brief description of how your implementation works */
int var1 = Expr1;
...
int varM = ExprM;
varJ = ExprJ;
...
varN = ExprN;
return ExprR;
}
Each "Expr" is an expression using ONLY the following:
1. Integer constants 0 through 255 (0xFF), inclusive. You are
not allowed to use big constants such as 0xffffffff.
2. Function arguments and local variables (no global variables).
3. Unary integer operations ! ~
4. Binary integer operations & ^ | + << >>
Some of the problems restrict the set of allowed operators even further.
Each "Expr" may consist of multiple operators. You are not restricted to
one operator per line.
You are expressly forbidden to:
1. Use any control constructs such as if, do, while, for, switch, etc.
2. Define or use any macros.
3. Define any additional functions in this file.
4. Call any functions.
5. Use any other operations, such as &&, ||, -, or ?:
6. Use any form of casting.
7. Use any data type other than int. This implies that you
cannot use arrays, structs, or unions.
You may assume that your machine:
1. Uses 2s complement, 32-bit representations of integers.
2. Performs right shifts arithmetically.
3. Has unpredictable behavior when shifting an integer by more
than the word size.
The specific cause is most likely in 1 << 31. Nominally, this would produce 231, but 231 is not representable in a 32-bit int. In C 2018 6.5.7 4, where the C standard specifies the behavior of <<, it says the behavior in this case is not defined.
When optimization is disabled, the compiler may generate a processor instruction that gives 1 left 31 bits. This produces the bit pattern 0x80000000, and subsequent instructions interpret that as −231.
In contrast, with optimization enabled, the optimization software recognizes that 1 << 31 is not defined and does not generate a shift instruction for it. It may replace it with a compile-time value. Since the behavior is not defined by the C standard, the compiler is allowed to use any value for that. It might use zero, for example. (Since the entire behavior is not defined, not just the result, the compiler is actually allowed toreplace this part of your program with anything. It could use entirely different instructions or just abort.)
You can start to fix that by using 1u << 31. That is defined because 231 fits in the unsigned int type. However, there is a problem when assigning that to tmin, because tmin is an int, and the value still does not fit in an int. However, for this conversion, the behavior is implementation-defined, not undefined. Common C implementations define the conversion to wrap modulo 232, which means that the assignment will store −231 in tmin. However, an alternative is to change tmin from int to unsigned int (which may also be written just as unsigned) and then work with unsigned integers. That will give fully defined behavior, rather than undefined or implementation-defined, except for assuming the int width is 32 bits.
Another problem is x + 1. When x is INT_MAX, that overflows. That is likely not the cause of the behavior you observe, as common compilers simply wrap the result. Nonetheless, it can be corrected similarly, by using x + 1u and changing the type of possible_tmin to unsigned.
That said, the desired result can be computed with return ! (x ^ ~0u >> 1);. This takes zero as an unsigned int, complements it to produce all 1 bits, and shifts it right one bit, which gives a single 0 bit followed by all 1 bits. That is the INT_MAX value, and it works regardless of the width of int. Then this is XORed with x. The result of that has all zero bits if and only if x is also INT_MAX. Then ! either changes that zero into 1 or changes a non-zero value into 0.
Change the type of the variables from int to unsigned int (or just unsigned) because bitwise operations with signed values cause undefined behavior.
#Voo made a correct observation, x+1 created an undefined behavior, which was not apparent at first as the printf calls did not show anything weird happening.

Effect of type casting on printf function

Here is a question from my book,
Actually, I don't know what will be the effect on printf function, so I tried the statements in the original system of C lang. Here is my code:
#include <stdio.h>
void main() {
int x = 4;
printf("%hi\n", x);
printf("%hu\n", x);
printf("%i\n", x);
printf("%u\n", x);
printf("%li\n", x);
printf("%lu\n", x);
}
Try it online!
So, the output is very simple. But, is this really the solution to above problem?
There are numerous problems in this question that make it unsuitable for teaching C.
First, to work on this problem at all, we have to assume a non-standard C implementation is used. In standard C, %x is a complete conversion specification, so %xu and %xd cannot be; the conversion specification has already ended before the u or d. And the uses of z in a conversion specification interferes with its standard use for size_t.
Nonetheless, let’s assume this C variant does not have those standard conversion specifications and instead uses the ones shown in the table but that this C variant otherwise conforms to the C standard with minimal changes.
Our next problem is that, in Y num = 42;, we have a plain Y, not the signed Y or unsigned Y shown in the table. Let’s assume signed Y is intended.
Then num is a signed four-bit integer. The greatest value it can represent is 01112 = 710. So it cannot represent 42. Attempting to initialize it with 42 results in a conversion specified by C 2018 6.3.1.3, which says, in part:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The result is we do not know what value is in num or even whether the program continues to execute; it may trap and terminate.
Well, let’s assume this implementation just takes the low bits of the value. 42 is 1010102, so its low four bits are 1010. So if the bits in num are 1010, it is negative. The C standard permits several methods of representation for negative numbers, but we will assume the overwhelmingly most common one, two’s complement, so the bits 1010 in num represent −6.
Now, we get to the printf statements. Except the problem text shows Printf, which is not defined by the C standard. (Are you sure this problem relates to C code at all?) Let’s assume it means printf.
In printf("%xu",num);, if the conversion specification is supposed to work like the ones in standard C, then the corresponding argument should be an unsigned X value that has been promoted to int for the function call. As a two-bit unsigned integer, an unsigned X can represent 0, 1, 2, or 3. Passing it −6 is not defined. So we do not know what the program will print. It might take just the low two bits, 10, and print “2”. Or it might use all the bits and print “-6”. Both of those would be consistent with the requirement that the printf behave as specified for values that are in the range representable by unsigned X.
In printf("%xd",num); and printf("%yu",num);, the same problem exists.
In printf("%yd",num);, we are correctly passing a signed Y value for a signed Y conversion specification, so “-6” is printed.
Then printf("%zu",num); has the same problem with the value mismatched for the type.
Finally, in printf("%zd",num);, the value is again in the correct range, and “-6” is printed.
From all the assumptions we had to make and all the points where the behavior is undefined, you can see this is a terrible exercise. You should question the quality of the book it is in and of any school using it.

Bitwise operation results in unexpected variable size

Context
We are porting C code that was originally compiled using an 8-bit C compiler for the PIC microcontroller. A common idiom that was used in order to prevent unsigned global variables (for example, error counters) from rolling over back to zero is the following:
if(~counter) counter++;
The bitwise operator here inverts all the bits and the statement is only true if counter is less than the maximum value. Importantly, this works regardless of the variable size.
Problem
We are now targeting a 32-bit ARM processor using GCC. We've noticed that the same code produces different results. So far as we can tell, it looks like the bitwise complement operation returns a value that is a different size than we would expect. To reproduce this, we compile, in GCC:
uint8_t i = 0;
int sz;
sz = sizeof(i);
printf("Size of variable: %d\n", sz); // Size of variable: 1
sz = sizeof(~i);
printf("Size of result: %d\n", sz); // Size of result: 4
In the first line of output, we get what we would expect: i is 1 byte. However, the bitwise complement of i is actually four bytes which causes a problem because comparisons with this now will not give the expected results. For example, if doing (where i is a properly-initialized uint8_t):
if(~i) i++;
we will see i "wrap around" from 0xFF back to 0x00. This behaviour is different in GCC compared with when it used to work as we intended in the previous compiler and 8-bit PIC microcontroller.
We are aware that we can resolve this by casting like so:
if((uint8_t)~i) i++;
or, by
if(i < 0xFF) i++;
however in both of these workarounds, the size of the variable must be known and is error-prone for the software developer. These kinds of upper bounds checks occur throughout the codebase. There are multiple sizes of variables (eg., uint16_t and unsigned char etc.) and changing these in an otherwise working codebase is not something we're looking forward to.
Question
Is our understanding of the problem correct, and are there options available to resolving this that do not require re-visiting each case where we've used this idiom? Is our assumption correct, that an operation like bitwise complement should return a result that is the same size as the operand? It seems like this would break, depending on processor architectures. I feel like I'm taking crazy pills and that C should be a bit more portable than this. Again, our understanding of this could be wrong.
On the surface this might not seem like a huge issue but this previously-working idiom is used in hundreds of locations and we're eager to understand this before proceeding with expensive changes.
Note: There is a seemingly similar but not exact duplicate question here: Bitwise operation on char gives 32 bit result
I didn't see the actual crux of the issue discussed there, namely, the result size of a bitwise complement being different than what's passed into the operator.
What you are seeing is the result of integer promotions. In most cases where an integer value is used in an expression, if the type of the value is smaller than int the value is promoted to int. This is documented in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an intor
unsigned int may be used
An object or expression with an integer type (other than intor unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int ,signed int, orunsigned int`.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions. All
other types are unchanged by the integer promotions.
So if a variable has type uint8_t and the value 255, using any operator other than a cast or assignment on it will first convert it to type int with the value 255 before performing the operation. This is why sizeof(~i) gives you 4 instead of 1.
Section 6.5.3.3 describes that integer promotions apply to the ~ operator:
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the
result has the promoted type. If the promoted type is an unsigned
type, the expression ~E is equivalent to the maximum value
representable in that type minus E.
So assuming a 32 bit int, if counter has the 8 bit value 0xff it is converted to the 32 bit value 0x000000ff, and applying ~ to it gives you 0xffffff00.
Probably the simplest way to handle this is without having to know the type is to check if the value is 0 after incrementing, and if so decrement it.
if (!++counter) counter--;
The wraparound of unsigned integers works in both directions, so decrementing a value of 0 gives you the largest positive value.
in sizeof(i); you request the size of the variable i, so 1
in sizeof(~i); you request the size of the type of the expression, which is an int, in your case 4
To use
if(~i)
to know if i does not value 255 (in your case with an the uint8_t) is not very readable, just do
if (i != 255)
and you will have a portable and readable code
There are multiple sizes of variables (eg., uint16_t and unsigned char etc.)
To manage any size of unsigned :
if (i != (((uintmax_t) 2 << (sizeof(i)*CHAR_BIT-1)) - 1))
The expression is constant, so computed at compile time.
#include <limits.h> for CHAR_BIT and #include <stdint.h> for uintmax_t
Here are several options for implementing “Add 1 to x but clamp at the maximum representable value,” given that x is some unsigned integer type:
Add one if and only if x is less than the maximum value representable in its type:
x += x < Maximum(x);
See the following item for the definition of Maximum. This method
stands a good chance of being optimized by a compiler to efficient
instructions such as a compare, some form of conditional set or move,
and an add.
Compare to the largest value of the type:
if (x < ((uintmax_t) 2u << sizeof x * CHAR_BIT - 1) - 1) ++x
(This calculates 2N, where N is the number of bits in x, by shifting 2 by N−1 bits. We do this instead of shifting 1 N bits because a shift by the number of bits in a type is not defined by the C standard. The CHAR_BIT macro may be unfamiliar to some; it is the number of bits in a byte, so sizeof x * CHAR_BIT is the number of bits in the type of x.)
This can be wrapped in a macro as desired for aesthetics and clarity:
#define Maximum(x) (((uintmax_t) 2u << sizeof (x) * CHAR_BIT - 1) - 1)
if (x < Maximum(x)) ++x;
Increment x and correct if it wraps to zero, using an if:
if (!++x) --x; // !++x is true if ++x wraps to zero.
Increment x and correct if it wraps to zero, using an expression:
++x; x -= !x;
This is is nominally branchless (sometimes beneficial for performance), but a compiler may implement it the same as above, using a branch if needed but possibly with unconditional instructions if the target architecture has suitable instructions.
A branchless option, using the above macro, is:
x += 1 - x/Maximum(x);
If x is the maximum of its type, this evaluates to x += 1-1. Otherwise, it is x += 1-0. However, division is somewhat slow on many architectures. A compiler may optimize this to instructions without division, depending on the compiler and the target architecture.
Before stdint.h the variable sizes can vary from compiler to compiler and the actual variable types in C are still int, long, etc and are still defined by the compiler author as to their size. Not some standard nor target specific assumptions. The author(s) then need to create stdint.h to map the two worlds, that is the purpose of stdint.h to map the uint_this that to int, long, short.
If you are porting code from another compiler and it uses char, short, int, long then you have to go through each type and do the port yourself, there is no way around it. And either you end up with the right size for the variable, the declaration changes but the code as written works....
if(~counter) counter++;
or...supply the mask or typecast directly
if((~counter)&0xFF) counter++;
if((uint_8)(~counter)) counter++;
At the end of the day if you want this code to work you have to port it to the new platform. Your choice as to how. Yes, you have to spend the time hit each case and do it right, otherwise you are going to keep coming back to this code which is even more expensive.
If you isolate the variable types on the code before porting and what size the variable types are, then isolate the variables that do this (should be easy to grep) and change their declarations using stdint.h definitions which hopefully won't change in the future, and you would be surprised but the wrong headers are used sometimes so even put checks in so you can sleep better at night
if(sizeof(uint_8)!=1) return(FAIL);
And while that style of coding works (if(~counter) counter++;), for portability desires now and in the future it is best to use a mask to specifically limit the size (and not rely on the declaration), do this when the code is written in the first place or just finish the port and then you won't have to re-port it again some other day. Or to make the code more readable then do the if x<0xFF then or x!=0xFF or something like that then the compiler can optimize it into the same code it would for any of these solutions, just makes it more readable and less risky...
Depends on how important the product is or how many times you want send out patches/updates or roll a truck or walk to the lab to fix the thing as to whether you try to find a quick solution or just touch the affected lines of code. if it is only a hundred or few that is not that big of a port.
6.5.3.3 Unary arithmetic operators
...
4 The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent
to the maximum value representable in that type minus E.
C 2011 Online Draft
The issue is that the operand of ~ is being promoted to int before the operator is applied.
Unfortunately, I don't think there's an easy way out of this. Writing
if ( counter + 1 ) counter++;
won't help because promotions apply there as well. The only thing I can suggest is creating some symbolic constants for the maximum value you want that object to represent and testing against that:
#define MAX_COUNTER 255
...
if ( counter < MAX_COUNTER-1 ) counter++;

C safely taking absolute value of integer

Consider following program (C99):
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(void)
{
printf("Enter int in range %jd .. %jd:\n > ", INTMAX_MIN, INTMAX_MAX);
intmax_t i;
if (scanf("%jd", &i) == 1)
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
}
Now as I understand it, this contains easily triggerable undefined behaviour, like this:
Enter int in range -9223372036854775808 .. 9223372036854775807:
> -9223372036854775808
Result: |-9223372036854775808| = -9223372036854775808
Questions:
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
(There are a few related questions, but I didn't find one which answers question 2 above, so if you suggest duplicate, please make sure it answers that.)
If the result of imaxabs cannot be represented, can happen if using two's complement, then the behavior is undefined.
7.8.2.1 The imaxabs function
The imaxabs function computes the absolute value of an integer j. If the result cannot
be represented, the behavior is undefined. 221)
221) The absolute value of the most negative number cannot be represented in two’s complement.
The check that makes no assumptions and is always defined is:
intmax_t i = ... ;
if( i < -INTMAX_MAX )
{
//handle error
}
(This if statement cannot be taken if using one's complement or sign-magnitude representation, so the compiler might give a unreachable code warning. The code itself is still defined and valid. )
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
One method is to use unsigned integers. The overflow behaviour of unsigned integers is well-defined as is the behaviour when converting from a signed to an unsigned integer.
So I think the following should be safe (turns out it's horriblly broken on some really obscure systems, see later in the post for an improved version)
uintmax_t j = i;
if (j > (uintmax_t)INTMAX_MAX) {
j = -j;
}
printf("Result: |%jd| = %ju\n", i, j);
So how does this work?
uintmax_t j = i;
This converts the signed integer into an unsigned one. IF it's positive the value stays the same, if it's negative the value increases by 2n (where n is the number of bits). This converts it to a large number (larger than INTMAX_MAX)
if (j > (uintmax_t)INTMAX_MAX) {
If the original number was positive (and hence less than or equal to INTMAX_MAX) this does nothing. If the original number was negative the inside of the if block is run.
j = -j;
The number is negated. The result of a negation is clearly negative and so cannot be represented as an unsigned integer. So it is increased by 2n.
So algebraically the result for negative i looks like
j = - (i + 2n) + 2n = -i
Clever, but this solution makes assumptions. This fails if INTMAX_MAX == UINTMAX_MAX, which is allowed by C Standard.
Hmm, lets look at this (i'm reading https://busybox.net/~landley/c99-draft.html which is apprarently the last C99 draft prior to standardisation, if anything changed in the final standard please do tell me.
When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation shall not provide a type without also providing its corresponding type.
In 6.2.5 I see
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
In 6.2.6.2 I see
#1
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that >objects of that type shall be capable of representing values from 0 to 2N-1 >using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.39)
#2
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M<=N). If the sign bit is zero, it shall not affect the resulting value.
So yes it seems you are right, while the signed and unsigned types have to be the same size it does seem to be valid for the unsigned type to have one more padding bit than the signed type.
Ok, based on the analysis above revealing a flaw in my first attempt i've written a more paranoid variant. This has two changes from my first version.
I use i < 0 rather than j > (uintmax_t)INTMAX_MAX to check for negative numbers. This means that the algorithm proceduces correct results for numbers grater than or equal to -INTMAX_MAX even when INTMAX_MAX == UINTMAX_MAX.
I add handling for the error case where INTMAX_MAX == UINTMAX_MAX, INTMAX_MIN == -INTMAX_MAX -1 and i == INTMAX_MIN. This will result in j=0 inside the if condition which we can easilly test for.
It can be seen from the requirements in the C standard that INTMAX_MIN cannot be smaller than -INTMAX_MAX -1 since there is only one sign bit and the number of value bits must be the same or lower than in the corresponding unsigned type. There are simply no bit patterns left to represent smaller numbers.
uintmax_t j = i;
if (i < 0) {
j = -j;
if (j == 0) {
printf("your platform sucks\n");
exit(1);
}
}
printf("Result: |%jd| = %ju\n", i, j);
#plugwash I think 2501 is correct. For example, -UINTMAX_MAX value becomes 1: (-UINTMAX_MAX + (UINTMAX_MAX + 1)), and is not caught by your if. – hyde 58 mins ago
Umm,
assuming INTMAX_MAX == UINTMAX_MAX and i = -INTMAX_MAX
uintmax_t j = i;
after this command j = -INTMAX_MAX + (UINTMAX_MAX + 1) = 1
if (i < 0) {
i is less than zero so we run the commands inside the if
j = -j;
after this command j = -1 + (UINTMAX_MAX + 1) = UINTMAX_MAX
which is the correct answer, so no need to trap it in an error case.
On two-complement systems getting the absolute number of the most negative value is indeed undefined behavior, as the absolute value would be out of range. And it's nothing the compiler can help you with, as the UB happens at run-time.
The only way to protect against that is to compare the input against the most negative value for the type (INTMAX_MIN in the code you show).
So calculating the absolute value of an integer invokes undefined behaviour in one single case. Actually, while the undefined behaviour can be avoided, it is impossible to give the correct result in one case.
Now consider multiplication of an integer by 3: Here we have a much more serious problem. This operation invokes undefined behaviour in 2/3rds of all cases! And for two thirds of all int values x, finding an int with the value 3x is just impossible. That's a much more serious problem than the absolute value problem.
You may want to use some bit hacks:
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
This works well when INT_MIN < v <= INT_MAX. In the case where v == INT_MIN, it remains INT_MIN , without causing undefined behavior.
You can also use bitwise operation to handle this on ones' complement and sign-magnitude systems.
Reference: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
according to this http://linux.die.net/man/3/imaxabs
Notes
Trying to take the absolute value of the most negative integer is not defined.
To handle the full range you could add something like this to your code
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
edit: As abs(INTMAX_MIN) cannot be represented on a 2's complement machine, 2 values within respresentable range are concatenated on output as a string.
Tested with gcc, though printf required %lld as %jd was not a supported format.
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
The behaviour of the program is only undefined, when the bad number is successfully input-ed and passed to imaxabs(), which on a typical 2's complement system returns a -ve result as you observed.
That is the undefined behaviour in this case, the implementation would also be allowed to terminate the program with an over-flow error if the ALU set status flags.
The reason for "undefined behaviour" in C is so compiler writers don't have to guard against overflow, so programs can run more efficiently. Whilst it is within C standard for every C program using abs() to try to kill your first born, just because you call it with a too -ve value, writing such code into the object file would simply be perverse.
The real problem with these undefined behaviours, is that an optimising compiler, can reason away naive checks so code like :
r = (i < 0) ? -i : i;
if (r < 0) { // This code may be pointless
// Do overflow recovery
doRecoveryProcessing();
} else {
printf("%jd", r);
}
As a compiler optomiser can reason that negative values are negated, it could in principal determine that (r <0) is always false, so the attempt to trap the problem fails.
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
By far the best way, is simply to ensure that the program works on a valid range, so in this case validating the input suffices (disallow INTMAX_MIN).
Programs printing tables of abs() ought to avoid INT*_MIN and so on.
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
Appears to write out the abs( INTMAX_MIN) by fakery, allowing the program to live up to it's promise to the user.

Short int values above 32767

When trying to store short integer values above 32,767 in C, just to see what happens, I notice the result that gets printed to the screen is the number I am trying to store, - 65,536. E.g. if i try to store 65,536 in a short variable, the number which is printed to the screen is 0. If I try to store 32,768 I get -32,768 printed to the screen. And if I try to store 65,546 and print it to the screen I get 10. I think you get the picture. Why am I seeing these results?
Integer values are stored using Twos Complement. In twos complement, the range of possible values is -2^n to 2^n-1, where n is the number of bits used for storage. Because of the way the storage is done, when you go above 2^n-1, you end up wrapping around back to 2^n.
Shorts use 16 bits (15 bits for numeric storage, with the final bit being the sign).
EDIT: Keep in mind that this behavior is NOT guaranteed to occur. Programming languages may act completely differently. Technically, going above or below the max/min value is undefined behavior and it should be treated as such. (Thanks to Eric Postpischil for keeping me on my toes)
When a value is converted to a signed integer type but cannot be represented in that type, overflow occurs, and the behavior is undefined. It is common to see results as if a two’s complement encoding is used and as if the low bits are stored (or, equivalently, the value is wrapped modulo an appropriate power of two). However, you cannot rely on this behavior. The C standard says that, when signed integer overflow occurs, the behavior is undefined. So a compiler may act in surprising ways.
Consider this code, compiled for a target where short int is 16 bits:
void foo(int a, int b)
{
if (a)
{
short int x = b;
printf("x is %hd.\n", x);
}
else
{
printf("x is not assigned.\n");
}
}
void bar(int a, int b)
{
if (b == 65536)
foo(a, b);
}
Observe that foo is a perfectly fine function on its own, provided b is within range of a short int. And bar is a perfectly fine function, as long as it is called only with a equal to zero or b not equal to 65536.
While the compiler is in-lining foo in bar, it may deduce from the fact that b must be 65536 at this point, that there would be an overflow in short int x = b;. This implies either that this path is never taken (i.e., a must be zero) or that any behavior is permitted (because the behavior upon overflow is undefined by the C standard). In either case, the compiler is free to omit this code path. Thus, for optimization, the compiler could omit this path and generate code only for printf("x is not assigned.\n");. If you then executed code containing bar(1, 65536), the output would be “x is not assigned.”!
Compilers do make optimizations of this sort: The observation that one code path has undefined behavior implies the compiler may conclude that code path is never used.
To an observer, it looks like the effect of assigning a too-large value to a short int is to cause completely different code to be executed.

Resources