What can be assumed about the representation of true? - c

This program returns 0 in my machine:
#include <stdbool.h>
union U {
_Bool b;
char c;
};
int main(void) {
union U u;
u.c = 3;
_Bool b = u.b;
if (b == true) {
return 0;
} else {
return 1;
}
}
AFAICT, _Bool is an integer type that can at least store 0 and 1, and true is the integral constant 1. On my machine, _Bool has a sizeof(_Bool) == 1, and CHAR_BITS == 8, which means that _Bool has 256 representations.
I can't find much in the C standard about the trap representations of _Bool, and I can't find whether creating a _Bool with a representation different from 0 or 1 (on implementations that support more than two representations) is ok, and if it is ok, whether those representations denote true or false.
What I can find in the standard is what happens when a _Bool is compared with an integer, the integer is converted to the 0 representation if it has value 0, and to the 1 representation if it has a value different than zero, such that the snippet above ends up comparing two _Bools with different representations: _Bool[3] == _Bool[1].
I can't find much in the C standard about what the result of such a comparison is. Since _Bool is an integer type, I'd expect the rules for integers to apply, such that the equality comparison only returns true if the representations are equal, which is not the case here.
Since on my platform this program returns 0, it would appear that this rule is not applying here.
Why does this code behave like this ? (i.e. what am I missing? Which representations of _Bool are trap representations and which ones aren't? How many representations can represent true and false ? What role do padding bits play into this? etc. )
What can portable C programs assume about the representation of _Bool ?

Footnote 122 in the C11 standard says:
While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.
So on a compiler where _Bool has only one value bit, only one of the bits of the char will have effect when you read it from memory as a _Bool. The other bits are padding bits which are ignored.
When I test your code with GCC, the _Bool member gets a value of 1 when assigning an odd number to u.c and 0 when assigning an even number, suggesting that it only looks at the lowest bit.
Note that the above is true only for type-punning. If you instead convert (implicit or explicit cast) a char to a _Bool, the value will be 1 if the char was nonzero.

Related

C - Is reading a _Bool after setting it with memset undefined, implementation defined?

In ISO standard C, my understanding is that there is nothing that actually nails down the the representation of a _Bool, but it does say:
"_Bool is large enough to hold the values 0 and 1"
"When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0;
otherwise, the result is 1"
"number of bits in a _Bool is atleast CHAR_BIT the width of a _Bool can be just 1 bit"
I am thinking then (and from other related answers), that the representation of false need not actually be 0 (even though in nearly all implementations, it is). So what happens if you memset a _Bool to 0, then use it somehow? Is this undefined behavior (by default because it is not defined in the standard) or implementation defined behavior? This seems to matter (in my understanding) because in the former case, it's not a well defined C program, in the latter it is. For example is this undefined behavior? Can false have a representation other than 0?
#include <stdbool.h>
//...
bool x = true;
memset(&x, 0, sizeof(bool));
if(x == true)
{
printf("Zero is true!");
}
else
{
printf("zero is false!");
}
_Bool is an unsigned integer type. It can represent at least values 0 and 1. Note there are no separate true and false values. The macro true in stdbool.h expands to the constant 1, and the macro false to the constant 0 7.18. So x == true is the same as x == 1.
There are two kinds of bits in an unsigned integer type: value bits and padding bits 6.2.6.2p1. Your invocation of memset sets all bits (value and padding) to zero.
For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type 6.2.6.2p5.
Thus, the program fragment as shown has no visible undefined, unspecified or implementation-defined behaviour. A reasonably completed program shall print zero is false.

Signed bit field represetation

I made a bit field with a field sized 1 bit, and used int instead of unsigned. Later on when i tried to check the value of the field i found that the value was -1.
I used this code to check the binary represantation and the value of my bit field:
#include <stdio.h>
#include <stdlib.h>
union {
struct {
int bit:1;
} field;
int rep;
} n;
int main() {
int c, k;
n.field.bit=1;
for (c = 31; c >= 0; c--)
{
k = n.rep >> c;
if (k & 1)
printf("1");
else
printf("0");
}
printf("\n %d \n", n.field.bit);
return 0;
}
the output was:
00000000000000000000000000000001
-1
In that case, why is the value of my bit field is -1 and is it always be a negative number when i use signed int instead of unsigned ?
You should never use plain int as the bitfield type if you're expecting something about the value besides that it can hold n bits - according to the C11 standard it is actually implementation-defined whether int in a bit-field is signed or unsigned 6.7.2p5:
5 Each of the comma-separated multisets designates the same type, except that for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.
In your case the int designates the same type as signed int; this is the default in GCC:
Whether a “plain” int bit-field is treated as a signed int bit-field or as an unsigned int bit-field (C90 6.5.2, C90 6.5.2.1, C99 and C11 6.7.2, C99 and C11 6.7.2.1).
By default it is treated as signed int but this may be changed by the -funsigned-bitfields option.
Thus any sane program always specifies either signed int or unsigned int, depending on which is appropriate for the current use case.
Then it is implementation defined whether the signed numbers are in one's complement, or two's complement - or perhaps sign and magnitude - if they're in one's complement or s-and-m, then the only value that can be stored in 1 bit is the sign bit, thus 0; so signed bit field of one bit probably makes sense only with 2's complement.
Your system seems to use 2's complement - this is e.g. what GCC always uses:
Whether signed integer types are represented using sign and magnitude, two’s complement, or one’s complement, and whether the extraordinary value is a trap representation or an ordinary value (C99 and C11 6.2.6.2).
GCC supports only two’s complement integer types, and all bit patterns are ordinary values.
and thus the bit values 1 and 0 are interpreted in terms of signed two's complement numbers: the former has sign bit set, so it is negative (-1) and the latter doesn't have a sign bit set so it is non-negative (0).
Thus for a signed bit-field of 2 bits, the possible bit patterns and their integer values on a 2's complement machine are
00 - has int value 0
01 - has int value 1
10 - has int value -2
11 - has int value -1
In an n-bit-field, the minimum signed number is - 2^(n - 1) and the maximum is 2^(n-1) - 1.
Now, when arithmetic is performed on a signed integer operand whose rank is less than int, it is converted to an int first, and thus the value -1 is sign-extended to full-width int; the same happens for default argument promotions; the value is sign-extended to a (full-width) int when it is passed in to printf.
Thus if you expect a sensible value from one bit bitfield, use either unsigned bit: 1; or alternatively if this is to be understood as a boolean flag, _Bool bit: 1;
When you call a variadic argument function (like printf) some arguments are promoted. For example bit-fields undergoes an integer promotion where it is promoted to an ordinary int value. That promotion brings with it sign extension (because your base type for the bit-field is signed). This sign extension will make it -1.
When using bit-fields, almost always use unsigned types as the base.

What is the difference between literals and variables in C (signed vs unsigned short ints)?

I have seen the following code in the book Computer Systems: A Programmer's Perspective, 2/E. This works well and creates the desired output. The output can be explained by the difference of signed and unsigned representations.
#include<stdio.h>
int main() {
if (-1 < 0u) {
printf("-1 < 0u\n");
}
else {
printf("-1 >= 0u\n");
}
return 0;
}
The code above yields -1 >= 0u, however, the following code which shall be the same as above, does not! In other words,
#include <stdio.h>
int main() {
unsigned short u = 0u;
short x = -1;
if (x < u)
printf("-1 < 0u\n");
else
printf("-1 >= 0u\n");
return 0;
}
yields -1 < 0u. Why this happened? I cannot explain this.
Note that I have seen similar questions like this, but they do not help.
PS. As #Abhineet said, the dilemma can be solved by changing short to int. However, how can one explains this phenomena? In other words, -1 in 4 bytes is 0xff ff ff ff and in 2 bytes is 0xff ff. Given them as 2s-complement which are interpreted as unsigned, they have corresponding values of 4294967295 and 65535. They both are not less than 0 and I think in both cases, the output needs to be -1 >= 0u, i.e. x >= u.
A sample output for it on a little endian Intel system:
For short:
-1 < 0u
u =
00 00
x =
ff ff
For int:
-1 >= 0u
u =
00 00 00 00
x =
ff ff ff ff
The code above yields -1 >= 0u
All integer literals (numeric constansts) have a type and therefore also a signedness. By default, they are of type int which is signed. When you append the u suffix, you turn the literal into unsigned int.
For any C expression where you have one operand which is signed and one which is unsiged, the rule of balacing (formally: the usual arithmetic conversions) implicitly converts the signed type to unsigned.
Conversion from signed to unsigned is well-defined (6.3.1.3):
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
For example, for 32 bit integers on a standard two's complement system, the max value of an unsigned integer is 2^32 - 1 (4294967295, UINT_MAX in limits.h). One more than the maximum value is 2^32. And -1 + 2^32 = 4294967295, so the literal -1 is converted to an unsigned int with the value 4294967295. Which is larger than 0.
When you switch types to short however, you end up with a small integer type. This is the difference between the two examples. Whenever a small integer type is part of an expression, the integer promotion rule implicitly converts it to a larger int (6.3.1.1):
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions. All other types are unchanged by the integer
promotions.
If short is smaller than int on the given platform (as is the case on 32 and 64 bit systems), any short or unsigned short will therefore always get converted to int, because they can fit inside one.
So for the expression if (x < u), you actually end up with if((int)x < (int)u) which behaves as expected (-1 is lesser than 0).
You're running into C's integer promotion rules.
Operators on types smaller than int automatically promote their operands to int or unsigned int. See comments for more detailed explanations. There is a further step for binary (two-operand) operators if the types still don't match after that (e.g. unsigned int vs. int). I won't try to summarize the rules in more detail than that. See Lundin's answer.
This blog post covers this in more detail, with a similar example to yours: signed and unsigned char. It quotes the C99 spec:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions. All other types are unchanged
by the integer promotions.
You can play around with this more easily on something like godbolt, with a function that returns one or zero. Just look at the compiler output to see what ends up happening.
#define mytype short
int main() {
unsigned mytype u = 0u;
mytype x = -1;
return (x < u);
}
Other than what you seem to assume , this is not a property of the particular width of the types, here 2 byte versus 4 bytes, but a question of the rules that are to be applied. The integer promotion rules state that short and unsigned short are converted to int on all platforms where the corresponding range of values fit into int. Since this is the case here, both values are preserved and obtain the type int. -1 is perfectly representable in int as is 0. So the test results in -1 is smaller than 0.
In the case of testing -1 against 0u the common conversion choses the unsigned type as a common type to which both are converted. -1 converted to unsigned is the value UINT_MAX, which is larger than 0u.
This is a good example, why you should never use "narrow" types to do arithmetic or comparison. Only use them if you have a sever size constraint. This will rarely be the case for simple variables, but mostly for large arrays where you can really gain from storing in a narrow type.
0u is not unsigned short, it's unsigned int.
Edit:: The explanation to the behavior,
How comparison is performed ?
As answered by Jens Gustedt,
This is called "usual arithmetic conversions" by the standard and
applies whenever two different integer types occur as operands of the
same operator.
In essence what is does
if the types have different width (more precisely what the standard
calls conversion rank) then it converts to the wider type if both
types are of same width, besides really weird architectures, the
unsigned of them wins Signed to unsigned conversion of the value -1
with whatever type always results in the highest representable value
of the unsigned type.
The more explanatory blog written by him could be found here.

Comparing int and unsigned int [duplicate]

I'm trying to understand why the following code doesn't issue a warning at the indicated place.
//from limits.h
#define UINT_MAX 0xffffffff /* maximum unsigned int value */
#define INT_MAX 2147483647 /* maximum (signed) int value */
/* = 0x7fffffff */
int a = INT_MAX;
//_int64 a = INT_MAX; // makes all warnings go away
unsigned int b = UINT_MAX;
bool c = false;
if(a < b) // warning C4018: '<' : signed/unsigned mismatch
c = true;
if(a > b) // warning C4018: '<' : signed/unsigned mismatch
c = true;
if(a <= b) // warning C4018: '<' : signed/unsigned mismatch
c = true;
if(a >= b) // warning C4018: '<' : signed/unsigned mismatch
c = true;
if(a == b) // no warning <--- warning expected here
c = true;
if(((unsigned int)a) == b) // no warning (as expected)
c = true;
if(a == ((int)b)) // no warning (as expected)
c = true;
I thought it was to do with background promotion, but the last two seem to say otherwise.
To my mind, the first == comparison is just as much a signed/unsigned mismatch as the others?
When comparing signed with unsigned, the compiler converts the signed value to unsigned. For equality, this doesn't matter, -1 == (unsigned) -1. For other comparisons it matters, e.g. the following is true: -1 > 2U.
EDIT: References:
5/9: (Expressions)
Many binary operators that expect
operands of arithmetic or enumeration
type cause conversions and yield
result types in a similar way. The
purpose is to yield a common type,
which is also the type of the result.
This pattern is called the usual
arithmetic conversions, which are
defined as follows:
If either
operand is of type long double, the
other shall be converted to long
double.
Otherwise, if either operand
is double, the other shall be
converted to double.
Otherwise, if
either operand is float, the other
shall be converted to float.
Otherwise, the integral promotions
(4.5) shall be performed on both
operands.54)
Then, if either operand
is unsigned long the other shall be
converted to unsigned long.
Otherwise, if one operand is a long
int and the other unsigned int, then
if a long int can represent all the
values of an unsigned int, the
unsigned int shall be converted to a
long int; otherwise both operands
shall be converted to unsigned long
int.
Otherwise, if either operand is
long, the other shall be converted to
long.
Otherwise, if either operand
is unsigned, the other shall be
converted to unsigned.
4.7/2: (Integral conversions)
If the destination type is unsigned,
the resulting value is the least
unsigned integer congruent to the
source integer (modulo 2n where n is
the number of bits used to represent
the unsigned type). [Note: In a two’s
complement representation, this
conversion is conceptual and there is
no change in the bit pattern (if there
is no truncation). ]
EDIT2: MSVC warning levels
What is warned about on the different warning levels of MSVC is, of course, choices made by the developers. As I see it, their choices in relation to signed/unsigned equality vs greater/less comparisons make sense, this is entirely subjective of course:
-1 == -1 means the same as -1 == (unsigned) -1 - I find that an intuitive result.
-1 < 2 does not mean the same as -1 < (unsigned) 2 - This is less intuitive at first glance, and IMO deserves an "earlier" warning.
Why signed/unsigned warnings are important and programmers must pay heed to them, is demonstrated by the following example.
Guess the output of this code?
#include <iostream>
int main() {
int i = -1;
unsigned int j = 1;
if ( i < j )
std::cout << " i is less than j";
else
std::cout << " i is greater than j";
return 0;
}
Output:
i is greater than j
Surprised? Online Demo : http://www.ideone.com/5iCxY
Bottomline: in comparison, if one operand is unsigned, then the other operand is implicitly converted into unsigned if its type is signed!
The == operator just does a bitwise comparison (by simple division to see if it is 0).
The smaller/bigger than comparisons rely much more on the sign of the number.
4 bit Example:
1111 = 15 ? or -1 ?
so if you have 1111 < 0001 ... it's ambiguous...
but if you have 1111 == 1111 ... It's the same thing although you didn't mean it to be.
In a system that represents the values using 2-complement (most modern processors) they are equal even in their binary form. This may be why compiler doesn't complain about a == b.
And to me it's strange compiler doesn't warn you on a == ((int)b). I think it should give you an integer truncation warning or something.
Starting from C++20 we have special functions for correct comparing signed-unsigned values
https://en.cppreference.com/w/cpp/utility/intcmp
The line of code in question does not generate a C4018 warning because Microsoft have used a different warning number (i.e. C4389) to handle that case, and C4389 is not enabled by default (i.e. at level 3).
From the Microsoft docs for C4389:
// C4389.cpp
// compile with: /W4
#pragma warning(default: 4389)
int main()
{
int a = 9;
unsigned int b = 10;
if (a == b) // C4389
return 0;
else
return 0;
};
The other answers have explained quite well why Microsoft might have decided to make a special case out of the equality operator, but I find those answers are not super helpful without mentioning C4389 or how to enable it in Visual Studio.
I should also mention that if you are going to enable C4389, you might also consider enabling C4388. Unfortunately there is no official documentation for C4388 but it seems to pop up in expressions like the following:
int a = 9;
unsigned int b = 10;
bool equal = (a == b); // C4388

Both expressions are TRUE

In the first block of code both conditions hold TRUE. In second, the first holds true and the other holds false.
int8_t i8 = -2;
uint16_t ui16 = i8;
if(ui16 == -2) //this is TRUE
if(ui16 == 65534) //this is TRUE as well
And This is the second scenario:
int8_t i8 = -2;
int16_t i16 = i8;
if(i16 == -2) //this is TRUE
if(i16 == 65534) //this is NOT TRUE !!!
Because -2 fits into int16_t whereas -2 is converted to unsigned in uint16_t.
This is well-defined behaviour.
from ISO/IEC 9899 (C99 standard working draft):
6.3.1.3 Signed and unsigned integers
...
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
...
49) The rules describe arithmetic on the mathematical value, not the value of a given type of expression
So if I do:
uint16_t i = -2;
the compiler should do:
i = -2 + (USHRT_MAX + 1);
or
i = -2 - (USHRT_MAX + 1);
until we get a value storable within 16 bits with no sign bit.
Not dependent on the rank of -2, but the mathematical value.
In your case this should be: 65534
Which it is with gcc.
[C++ follows the same rules for signed conversions]
In your second section of code you are simply assigning a lower rank value to a higher rank variable.
e.g. using more bits of precision to store the same number.
When you check against i16 == 65534 you are invoking this part of the standard from the same section:
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
because 65534 is not storable in 15 bits and a sign bit (215 - 1).
So invoking implementation defined behaviour.
Relying on the return value of this is just as bad as relying on undefined behaviour unless you're a compiler developer.
In C, unsigned integers always behave according to modular (clock-face) arithmetic, but signed integers only sometimes, unreliably do.
Generally speaking, expecting one number to equal a different number is nonsense. You shouldn't write programs that way. If you want a number like -2 to behave like a positive unsigned value, you should explicitly write a cast like (uint16_t) -2. Otherwise, there are many things that could go wrong.

Resources