Both expressions are TRUE - c

In the first block of code both conditions hold TRUE. In second, the first holds true and the other holds false.
int8_t i8 = -2;
uint16_t ui16 = i8;
if(ui16 == -2) //this is TRUE
if(ui16 == 65534) //this is TRUE as well
And This is the second scenario:
int8_t i8 = -2;
int16_t i16 = i8;
if(i16 == -2) //this is TRUE
if(i16 == 65534) //this is NOT TRUE !!!

Because -2 fits into int16_t whereas -2 is converted to unsigned in uint16_t.
This is well-defined behaviour.
from ISO/IEC 9899 (C99 standard working draft):
6.3.1.3 Signed and unsigned integers
...
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
...
49) The rules describe arithmetic on the mathematical value, not the value of a given type of expression
So if I do:
uint16_t i = -2;
the compiler should do:
i = -2 + (USHRT_MAX + 1);
or
i = -2 - (USHRT_MAX + 1);
until we get a value storable within 16 bits with no sign bit.
Not dependent on the rank of -2, but the mathematical value.
In your case this should be: 65534
Which it is with gcc.
[C++ follows the same rules for signed conversions]
In your second section of code you are simply assigning a lower rank value to a higher rank variable.
e.g. using more bits of precision to store the same number.
When you check against i16 == 65534 you are invoking this part of the standard from the same section:
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
because 65534 is not storable in 15 bits and a sign bit (215 - 1).
So invoking implementation defined behaviour.
Relying on the return value of this is just as bad as relying on undefined behaviour unless you're a compiler developer.

In C, unsigned integers always behave according to modular (clock-face) arithmetic, but signed integers only sometimes, unreliably do.
Generally speaking, expecting one number to equal a different number is nonsense. You shouldn't write programs that way. If you want a number like -2 to behave like a positive unsigned value, you should explicitly write a cast like (uint16_t) -2. Otherwise, there are many things that could go wrong.

Related

Why I can assign a negative value to an unsigned int data type?

I was doing some experiments in a code in order to prove the theory. This is my code:
#include <stdio.h>
int main(){
unsigned int x,y,z;
if(1){
x=-5;
y=5;
z=x+y;
printf("%i",z);
}
return 0;
}
But for what I know the output should have been 10, but instead it prints 0, why this is happening? why I can assign a negative value to an unsigned int data type
From section 6.5.16.1 of the C standard:
In simple assignment (=), the value of the right operand is converted
to the type of the assignment expression and replaces the value stored
in the object designated by the left operand.
From section 6.3.1.3 Signed and unsigned integers:
... Otherwise, if the new type is unsigned, the value is converted
by repeatedly adding or subtracting one more than the maximum value
that can be represented in the new type until the value is in the
range of the new type.
So x=-5 assigns UINT_MAX + 1 - 5 to x.
Signed to unsigned conversion happens as per "mathematic modulus by UINT_MAX+1", from C17 6.3.1.3:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type
Now as it happens, on 2's complement computers this is the same as taking the binary representation of the signed number and treat it like an unsigned number. In your case -5 is represented as binary 0xFFFFFFFB, so the unsigned number ends up as 4294967291. And 4294967291 + 5 creates an unsigned wrap-around from UINT_MAX = 4294967295 to 0 (which is well-defined, unlike signed overflow).
So -5 does not just discard the sign when converted to unsigned. If that's what you want to happen, use the abs() function from stdlib.h.
It's a basic characteristic of pretty much all integral types that they have a defined range of values they can hold. But then the question is, what happens if you try to set a value outside that range? For C's unsigned types, the answer is that they operate via modular arithmetic.
On a modern machine, type unsigned int probably has a range of 0 to 4294967295. Obviously -5 does not fit into that range. So modular arithmetic says that we add or subtract some multiple of 4294967296 until we get a value that is in the range. Now, -5 + 4294967296 is 4294967291, and that is in range, so that's the value which gets stored in your variable x.
So then x + y will be 4294967296, but that's not in the range of 0 to 4294967295, either. But if we subtract 4294967296, we get 0, and that's in range, so that's our answer.
And along the way we've discovered how two's complement arithmetic works. It turns out that, if we had declared x as a signed int and set it to -5, it would have ended up containing the same bit pattern as 4294967291. And as we've seen, 4294967291 is precisely the bit pattern we want to be able to add to 5 in order to get 0 (after wrapping around, that is). So 4294967291 is a great internal value to use for -5, since you obviously want -5 + 5 to be 0.
Why i can assign a negative value to an unsigned int data type?
Assigning an out-of-range value to an unsigned type is well defined.
But first, in trying to report the value, code invokes undefined behavior (UB) using a mismatched printf() specifier with unsigned:
// printf("%i",z); // Bad
printf("%u\n",z); // Good
Try printf("%u %u %u %u\n", x, y, z, UINT_MAX); to properly see all 4 values.
x=-5; assigns an out-of-range value to an unsigned. With unsigned types, the value is converted to an in-range value by adding/subtracting the max value of the type + 1 until in range. In this case x will have the value of UINT_MAX + 1 - 5.
y=5; is OK.
x+y will then incur unsigned math overflow. The sum is also converted to an in-range value in a like-wise manner.
x+y will have the value of (UINT_MAX + 1 - 5) + 5 --> (UINT_MAX + 1 - 5) + 5 - (UINT_MAX + 1) --> 0.

How does C casting unsigned to signed work?

What language in the standard makes this code work, printing '-1'?
unsigned int u = UINT_MAX;
signed int s = u;
printf("%d", s);
https://en.cppreference.com/w/c/language/conversion
otherwise, if the target type is signed, the behavior is implementation-defined (which may include raising a signal)
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
GCC supports only two’s complement integer types, and all bit patterns are ordinary values.
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3):
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
To me it seems like converting UINT_MAX to an int would therefore mean dividing UINT_MAX by 2^(CHAR_BIT * sizeof(int)). For the sake of argument, with 32 bit ints, 0xFFFFFFFF / 2^32 = 0xFFFFFFFF. So this doesnt really explain how the value '-1' ends up in the int.
Is there some language somewhere else that says after the modulo division we just reinterpret the bits? Or some other part of the standard that takes precedence before the parts I have referenced?
No part of the C standard guarantees that your code shall print -1 in general. As it says, the result of the conversion is implementation-defined. However, the GCC documentation does promise that if you compile with their implementation, then your code will print -1. It's nothing to do with bit patterns, just math.
The clearly intended reading of "reduced modulo 2^N" in the GCC manual is that the result should be the unique number in the range of signed int that is congruent mod 2^N to the input. This is a precise mathematical way of defining the "wrapping" behavior that you expect, which happens to coincide with what you would get by reinterpreting the bits.
Assuming 32 bits, UINT_MAX has the value 4294967295. This is congruent mod 4294967296 to -1. That is, the difference between 4294967295 and -1 is a multiple of 4294967296, namely 4294967296 itself. Moreover, this is necessarily the unique such number in [-2147483648, 2147483647]. (Any other number congruent to -1 would be at least -1 + 4294967296 = 4294967295, or at most -1 - 4294967296 = -4294967297). So -1 is the result of the conversion.
In other words, add or subtract 4294967296 repeatedly until you get a number that's in the range of signed int. There's guaranteed to be exactly one such number, and in this case it's -1.

C How does signed to signed integer conversion work?

How does the computer convert between differently sized signed intgers?
For example when i convert the long long int value 12000000000 to int how does it reduce the value? And how does it handle negative numbers?
how does it reduce the value?
From C11 standard 6.3.1.3p3:
When a value with integer type is converted to another integer type [...]
[...]
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
It is not defined how to convert the value - instead, each compiler may have different behavior, but it has to have some documented behavior. Nowadays we live in twos-complement world - it's everywhere the same. Let's take a look at gcc compiler - from ex. gcc documentation integers implementation-defined behavior:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
So we have:
long long int value 12000000000 to int
Let's assume long long int has 64 bits and int has 32 bits and byte has 8 bits and we use twos-complement, so INT_MIN = -2147483648 INT_MAX = 2147483647 and N is 32, and 2^N is, well, 4294967296. You should take a peek at modular arithmetic and we know that:
12000000000 = 3 * 4294967296 + -884901888
So it will be converted to -884901888. That is irrelevant to what format is used to store the number - it can be in any format it wishes.
Now, gcc is smart, and while the documentation states the mathematical description of the algorithm in modulo arithmetic, you can note that:
$ printf("%16llx\n%16x\n", 12000000000ll, (int)12000000000ll);
2cb417800
cb417800
Ie the mathematical operation of "modulo 2^32" is equal in binary to doing an AND mask with all bits set num & 0xffffffff.
And how does it handle negative numbers?
Exactly the same way, there's just a minus. For example -12000000000ll :
-12000000000ll = -3 * 4294967296 + 884901888
So (int)-12000000000ll will be converted to 884901888. Note that in binary it's just:
$ printf("%16llx\n%16x\n", -12000000000ll, (int)-12000000000ll);'
fffffffd34be8800
34be8800
Attempting to convert an integer representation to a smaller, signed type in which it cannot be properly represented (such as your example of trying to convert 12000000000 to a 32-bit int) is implementation-defined behaviour. From this C11 Draft Standard (the third paragraph being relevant here):
6.3.1.3 Signed and unsigned integers
1   When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is
unchanged.
2   Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.60)
3   Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised
I found a solution, that works with 2s complement. It can convert integers up and down in width and works with positive and negative numbers.
a is the number,
oldp is the old position of the sign bit
newp is the new position of the sign bit
uint64_t shsbc(uint64_t a, uint32_t oldp, uint32_t newp) {
if (!(a>>oldp&1)) {
if (oldp > newp) {
return(a & UINT64_MAX>>(64 - (newp + 1)));
} else {
return(a & UINT64_MAX>>(64 - (oldp + 1)));
}
}
if (oldp > newp) {
a &= UINT64_MAX>>((oldp - newp) + 1 + (63 - oldp));
a |= ((uint64_t) 1)<<newp;;
return(a);
} else {
a &= UINT64_MAX>>((newp - oldp) + 1 + (63 - newp));
a |= UINT64_MAX>>(63 - (newp - oldp))<<(newp - oldp - 1);
return(a);
}
}

"Overflow" assertions on various C data type: which ones are guaranteed to be true according to the C Standard?

While all those assertions hold true on my system, I am obviously calling several undefined and/or implementation-specific behaviors. Some of which
are apparently not actual overflow.
See this comment for reference: this is the reason why I am asking this question.
num = num + 1 does not cause an overflow. num is automatically promoted to int, and then the addition is performed in int, which yields 128 without overflow. Then the assignment performs a conversion to char.
This is not an overflow but, per C 2018 6.3.1.3, produces an implementation-defined result or signal. This differs from overflow because the C standard does not specify the behavior upon overflow at all, but, in this code, it specifies that the implementation must define the behavior. - Eric Postpischil
I put in comment what I believe to be the actual behavior.
Because I have relied on misconceptions, I prefer not to assume anything.
#include <limits.h>
#include <assert.h>
#include <stdint.h>
#include <stddef.h>
int main(void)
{
signed char sc = CHAR_MAX;
unsigned char uc = UCHAR_MAX;
signed short ss = SHRT_MAX;
unsigned short us = USHRT_MAX;
signed int si = INT_MAX;
unsigned int ui = UINT_MAX;
signed long sl = LONG_MAX;
unsigned long ul = ULONG_MAX;
size_t zu = SIZE_MAX;
++sc;
++uc;
++ss;
++us;
++si;
++ui;
++sl;
++ul;
++zu;
assert(sc == CHAR_MIN); //integer promotion, implementation specific ?
assert(uc == 0); //integer promotion, implementation specific ?
assert(ss == SHRT_MIN); //integer promotion, implementation specific ?
assert(us == 0); //integer promotion, implementation specific ?
assert(si == INT_MIN); //overflow & undefined
assert(ui == 0); //wrap around: Guaranteed
assert(sl == LONG_MIN); //overflow & undefined ?
assert(ul == 0); //wrap around: Guaranteed ?
assert(zu == 0); //wrap around : Guaranteed ?
return (0);
}
All citations below are from C 2018, official version.
Signed Integers Narrower Than int, Binary +
Let us discuss this case first since it is the one that prompted this question. Consider this code, which does not appear in the question:
signed char sc = SCHAR_MAX;
sc = sc + 1;
assert(sc == SCHAR_MIN);
6.5.6 discusses the binary + operator. Paragraph 4 says the usual arithmetic conversions are performed on them. This results in the sc in sc + 1 being converted to int1, and 1 is already int. So sc + 1 yields one more than SCHAR_MAX (commonly 127 + 1 = 128), and there is no overflow or representation problem in the addition.
Then we must perform the assignment, which is discussed in 6.5.16.1. Paragraph 2 says “… the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.” So we must convert this value greater than SCHAR_MAX to signed char, and it clearly cannot be represented in signed char.
6.3.1.3 tells us about the conversions of integers. Regarding this situation, it says “… Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.”
Thus, we have an implementation-defined result or signal. This differs from overflow, which is what happens when, during evaluation of an expression, the result is not representable. 6.5 5 says “If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.” For example, if we evaluate INT_MAX + 1, then both INT_MAX and 1 have type int, so the operation is performed with type int, but the mathematical result is not representable in int, so this is an exceptional condition, and the behavior is not defined by the C standard. In contrast, during the conversion, the behavior is partially defined by the standard: The standard requires the implementation to define the behavior, and it must either produce a result it defines or define a signal.
In many implementations, the assertion will evaluate to true. See the “Signed Integers Not Narrower Than int” section below for further discussion.
Signed Integers Narrower Than int, Prefix ++
Next, consider this case, extracted from the question, except that I changed CHAR_MAX and and CHAR_MIN to SCHAR_MAX and SCHAR_MIN to match the signed char type:
signed char sc = SCHAR_MAX;
++sc;
assert(sc == SCHAR_MIN);
We have unary ++ instead of binary +. 6.5.3.1 2 says “The value of the operand of the prefix ++ is incremented…” This clause does not explicitly say the usual arithmetic conversions or integer promotions are performed, but it does say, also in paragraph 2, “See the discussions of additive operators and compound assignment for information on constraints, types, side effects, and conversions and the effects of operations on pointers.” That tells us it behaves like sc = sc + 1;, and the above section about binary + applies to prefix ++, so the behavior is the same.
Unsigned Integers Narrower Than int, Binary +
Consider this code modified to use binary + instead of prefix ++:
unsigned char uc = UCHAR_MAX;
uc = uc + 1;
assert(uc == 0);
As with signed char, the arithmetic is performed with int and then converted to the assignment destination type. This conversion is specified by 6.3.1.3: “Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.” Thus, from the mathematical result (UCHAR_MAX + 1), one more than the maximum (also UCHAR_MAX + 1) is subtracted until the value is in range. A single subtraction yields 0, which is in range, so the result is 0, and the assertion is true.
Unsigned Integers Narrower Than int, Prefix ++
Consider this code extracted from the question:
unsigned char uc = UCHAR_MAX;
++uc;
assert(uc == 0);
As with the earlier prefix ++ case, the arithmetic is the same as uc = uc + 1, discussed above.
Signed Integers Not Narrower Than int
In this code:
signed int si = INT_MAX;
++si;
assert(si == INT_MIN);
or this code:
signed int si = INT_MAX;
si = si + 1;
assert(si == INT_MIN);
the arithmetic is performed using int. In either case, the computation overflows, and the behavior is not defined by the C standard.
If we ponder what implementations will do, several possibilities are:
In a two’s complement implementation, the bit pattern resulting from adding 1 to INT_MAX overflows to the bit pattern for INT_MIN, and this is the value the implementation effectively uses.
In a one’s complement implementation, the bit pattern resulting from adding 1 to INT_MAX overflows to the bit pattern for INT_MIN, although it is a different value than we are familiar with for INT_MIN (−2−31+1 instead of −2−31).
In a sign-and-magnitude implementation, the bit pattern resulting from adding 1 to INT_MAX overflows to the bit pattern for −0.
The hardware detects overflow, and a signal occurs.
The compiler detects the overflow and transform the code in unexpected ways during optimization.
Unsigned Integers Not Narrower than int
This cases are unremarkable; the behavior is the same as for the narrower-than-int cases discussed above: The arithmetic wraps.
Footnote
1 Per discussion elsewhere in Stack Overflow, it may be theoretically possible for the char (and signed char) type to be as wide as an int. This strains the C standard regarding EOF and possibly other issues and was certainly not anticipated by the C committee. This answer disregards such esoteric C implementations and considers only implementations in which char is narrower than int.
assert(sc == CHAR_MIN); //integer promotion, implementation specific ?
Depends on implementation-defined conversion of CHAR_MAX+1 to char if char is signed; otherwise it's false because CHAR_MIN != SCHAR_MIN. And if CHAR_MAX==INT_MAX (possible, but not viable for meeting other requirements of a hosted implementation; see Can sizeof(int) ever be 1 on a hosted implementation?) then the original sc++ was UB.
assert(uc == 0); //integer promotion, implementation specific ?
Always true.
assert(ss == SHRT_MIN); //integer promotion, implementation specific ?
Same logic as sc case. Depends on implementation-defined conversion of SHRT_MAX+1 to short, or UB if SHRT_MAX==INT_MAX.
assert(us == 0); //integer promotion, implementation specific ?
Always true.
assert(si == INT_MIN); //overflow & undefined
UB.
assert(ui == 0); //wrap around: Guaranteed
Always true.
assert(sl == LONG_MIN); //overflow & undefined ?
UB.
assert(ul == 0); //wrap around: Guaranteed ?
Always true.
assert(zu == 0); //wrap around : Guaranteed ?
Always true.
“Overflow” assertions on various C data type:
True according to the C Standard
assert(uc == 0);
assert(us == 0);
assert(ui == 0);
assert(ul == 0);
assert(zu == 0);
I think you wanted to test signed char sc = SCHAR_MAX; ... assert(sc == SCHAR_MIN);
When the signed type has a narrower range than int:
"result is implementation-defined or an implementation-defined signal is raised" as part of the ++ re-assignment.
When the signed type is as wide or wider range than int:
UB due to signed integer overflow during a ++.

What can be assumed about the representation of true?

This program returns 0 in my machine:
#include <stdbool.h>
union U {
_Bool b;
char c;
};
int main(void) {
union U u;
u.c = 3;
_Bool b = u.b;
if (b == true) {
return 0;
} else {
return 1;
}
}
AFAICT, _Bool is an integer type that can at least store 0 and 1, and true is the integral constant 1. On my machine, _Bool has a sizeof(_Bool) == 1, and CHAR_BITS == 8, which means that _Bool has 256 representations.
I can't find much in the C standard about the trap representations of _Bool, and I can't find whether creating a _Bool with a representation different from 0 or 1 (on implementations that support more than two representations) is ok, and if it is ok, whether those representations denote true or false.
What I can find in the standard is what happens when a _Bool is compared with an integer, the integer is converted to the 0 representation if it has value 0, and to the 1 representation if it has a value different than zero, such that the snippet above ends up comparing two _Bools with different representations: _Bool[3] == _Bool[1].
I can't find much in the C standard about what the result of such a comparison is. Since _Bool is an integer type, I'd expect the rules for integers to apply, such that the equality comparison only returns true if the representations are equal, which is not the case here.
Since on my platform this program returns 0, it would appear that this rule is not applying here.
Why does this code behave like this ? (i.e. what am I missing? Which representations of _Bool are trap representations and which ones aren't? How many representations can represent true and false ? What role do padding bits play into this? etc. )
What can portable C programs assume about the representation of _Bool ?
Footnote 122 in the C11 standard says:
While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.
So on a compiler where _Bool has only one value bit, only one of the bits of the char will have effect when you read it from memory as a _Bool. The other bits are padding bits which are ignored.
When I test your code with GCC, the _Bool member gets a value of 1 when assigning an odd number to u.c and 0 when assigning an even number, suggesting that it only looks at the lowest bit.
Note that the above is true only for type-punning. If you instead convert (implicit or explicit cast) a char to a _Bool, the value will be 1 if the char was nonzero.

Resources