C Compare enumerate with invalid value - c

I would like try to understand how is working the compilator when we compare an enumerate with invalid value, and what the program is doing during execution.
I found strange source code during my work, and did not understand the behaviour of the program, which was not giving me the expected result.
I wrote the following little program to summarize my problem.
I create an enum E_Number and I instanciate a variable a, with the value -1.
Then I perform comparison on a to check if it belongs to the range of the enum.
(I know, this is really strange, but this is exactly what i found in source code !)
I expected the result tells me Not in range because of the fail of the first condition (a >= FIRST_ENUM).
But it was the fail of the second condition (a < NB_MAX_NUMBER) which gave me the right result (see the printf())...
If I cast a in (int) in the if conditions, I get excepted results.
So what is happening during the execution ? Is the program considering -1 as an other possible enum value which will be positionned after NB_MAX_NUMBER ? What is the rule for > and < operator on enum ?
#include <stdio.h>
#define FIRST_ENUM 0
typedef enum{
NUM_1 = FIRST_ENUM,
NUM_2,
NUM_3,
NB_MAX_NUMBER
}E_Number;
int main()
{
E_Number a = -1;
if ((a >= FIRST_ENUM) && (a < NB_MAX_NUMBER))
{
printf("In Range\n");
}
else
{
printf("Not in Range\n");
}
printf("1st condition = %s\n", (a >= FIRST_ENUM)?"TRUE":"FALSE");
printf("2nd condition = %s\n", (a < NB_MAX_NUMBER)?"TRUE":"FALSE");
return 0;
}
gcc program.c
.\a.exe
Not in Range
1st condition = TRUE
2nd condition = FALSE
I am working with MINGW compilator ( gcc (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.9.2 )

In your case the compiler consider E_Number as unsigned int because all the legal values are unsigned, so -1 is considered to be ~0u which is >= FIRST_ENUM and < NB_MAX_NUMBER
I have the same behavior with gcc version 6.3.0 20170516 (Raspbian 6.3.0-18+rpi1+deb9u1)
pi#raspberrypi:~ $ ./a.out
Not in Range
1st condition = TRUE
2nd condition = FALSE
But, if I change your definitions like that :
#include <stdio.h>
#define FIRST_ENUM -1
typedef enum{
NUM_1 = FIRST_ENUM,
NUM_2,
NUM_3,
NB_MAX_NUMBER
}E_Number;
int main()
{
E_Number a = -2;
if ((a >= FIRST_ENUM) && (a < NB_MAX_NUMBER))
{
printf("In Range\n");
}
else
{
printf("Not in Range\n");
}
printf("1st condition = %s\n", (a >= FIRST_ENUM)?"TRUE":"FALSE");
printf("2nd condition = %s\n", (a < NB_MAX_NUMBER)?"TRUE":"FALSE");
return 0;
}
the behavior change and the enum is considered to be an int and I have :
pi#raspberrypi:~ $ ./a.out
Not in Range
1st condition = FALSE
2nd condition = TRUE

Enumarator constants are of type int. The enumerator type is an unspecified integer type capable of representing all the enumerator constants.
6.7.2.2p4:
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined,128) but shall be capable of representing the
values of all the members of the enumeration. The enumerated type is
incomplete until immediately after the } that terminates the list of
enumerator declarations, and complete thereafter.
Since you haven't enumerated any negative values, that type may well be an unsigned type. If it is, then (E_Number)some_integer will always be greater than or equal to zero (0==FIRST_ENUM).
If you expand the enum list to:
typedef enum{
NUM_NOPE=-1,
NUM_1 = FIRST_ENUM,
NUM_2,
NUM_3,
NB_MAX_NUMBER
}E_Number;
you'll force the compiler to use a signed type and the results will reverse.

Quote from ISO/IEC 9899:1999, 6.7.2.2p3
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined, 108) but shall be capable of representing the
values of all the members of the enumeration.
So, when you declare an enumeration, you cannot be sure a priori about what kind of data will the implementation of C choose to store that variable. Optimisation reasons, the compiler may not choose an integer type on 4 bytes if you store enumeration constants between [-128, +127]. The implementation may choose char to store an enumerated variable, but you cannot be sure. Any integer data type can be chosen as time as it can store all possible values.

Related

Casting to _Bool

Traditionally, Boolean values in C were represented with int or char. The new _Bool type makes intent clearer, but also has another interesting feature: it seems casting floating-point numbers to it, does not truncate toward zero, but compares with exact zero:
#include <stdio.h>
int main(int argc, char **argv) {
double a = 0.1;
int i = (int)a;
printf("%d\n", i);
_Bool b = (_Bool)a;
printf("%d\n", b);
return 0;
}
prints
0
1
So this is a semantic difference. And one I'm happy with; it duplicates the effect of using a floating-point number as a conditional.
Is this something that can be depended on across-the-board? Does the new C standard define the result of casting X to _Bool as identical to X ? 1 : 0 for all X for which that is a valid operation?
In the C Standard (6.3.1.2 Boolean type) there is written
1 When any scalar value is converted to _Bool, the result is 0 if the
value compares equal to 0; otherwise, the result is 1.
That is during conversion to the type _Bool the compiler does not try to represent the value (truncating toward zero or something else) of the operand of an expression with the casting operator as an integer. It only checks whether the value is equal or unequal to zero.
Actually this declaration
_Bool b = (_Bool)a;
is equivalent to
_Bool b = a;
It is entirely consistent, a non-zero value implicitly cast to _Bool is true. Since _Bool is a true Boolean type and not "faked", it can behave correctly. So:
_Bool b = (_Bool)a ;
is equivalent to:
_Bool b = (a == 0) ;
not:
_Bool b = ((int)a == 0) ;
In the end interpreting a float as Boolean is nonsense and ill-advised as is comparing it for equality to zero. If you want the semantic you expect, you must code it explicitly:
_Bool b = (_Bool)((int)a);
Semantically that is equivalent to :
_Bool b = (a < 1.0);
It is clearer and safer to use a Boolean expression than to force a value to Boolean with a cast.

comparison operator optimization in C

I just came across an interesting case but I can't find any information about it and I was wondering if anyone here might know the answer.
So I have the macro INT_MAX which is the largest possible number an int can store on my operating system.
the following if statement has some weird behavior:
#include <stdio.h>
#include <limits.h>
int main(int argc, const char* argv[]) {
int maxValue = INT_MAX;
printf("INT_MAX: %d\n", maxValue);
printf("INT_MAX + 1: %d\n", maxValue + 1);
if (INT_MAX < maxValue + 1) {
printf("no overflow\n");
} else {
printf("overflow\n");
}
return 0;
}
by running this program we get the value of INT_MAX and the overflow of INT_MAX followed by overflow.
if I switch INT_MAX with the variable maxValue the 'else' is executed instead and "no overflow" is printed. I assume this means that the if statement or < operator is checking if both the left and right values passed to it are the same and instead of doing the actual calculation it simply returns 1 as it sees that on the right hand side we're adding a positive value to the same variable.
So is this what is actually happening or is it something else entirely?
Thanks!
edit: INT_MAX not MAX_INT
I assume this means that the if statement or < operator is checking if both the left and right values passed to it are the same and instead of doing the actual calculation it simply returns 1 as it sees that on the right hand side we're adding a positive value to the same variable.
So is this what is actually happening or is it something else entirely?
That is an optimization that compilers commonly make. It is likely it is what is occurring in your example, although proving this particular optimization is responsible rather than some other behavior in the compiler would require diving into compiler internals. Godbolt does show that Clang and GCC compile the following code to a constant return value of 1:
int foo(int x)
{
return x < x+1;
}
The assembly generated by Clang is:
foo: # #foo
mov eax, 1
ret
When the calculation result goes beyond the range that the result type can represent, undefined behavior is invoked and anything is allowed to happen.
Quote from N1570 6.5 Expressions 5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
The result of maxValue + 1 when maxValue = INT_MAX will go beyond the range of int, so undefined behavior is invoked here.
On the other hand, if you use unsigned integer, calculating UINT_MAX + 1 won't invoke undefined behavior because results of unsigned integer calcuhation is defined as modulo to the range of the types.
Quote from N1570 6.2.5 Types 9:
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
Therefore, this code will print overflow even after replacing maxValue < maxValue + 1 to UINT_MAX < maxValue + 1.
#include <stdio.h>
#include <limits.h>
int main(int argc, const char* argv[]) {
unsigned int maxValue = UINT_MAX;
printf("UINT_MAX: %u\n", maxValue);
printf("UINT_MAX + 1: %u\n", maxValue + 1);
if (maxValue < maxValue + 1) {
printf("no overflow\n");
} else {
printf("overflow\n");
}
return 0;
}

What advantage does _Bool give?

If the _Bool type acts like an integer and doesn't enforce that a value is true/false or 1/0, for example:
_Bool bools[] = {0,3,'c',0x17};
printf("%d", bools[2]);
> 1
What is the advantage of having that there? Is it just a simple way to coerce things to see how they would evaluate for 'truth-ness', for example:
printf("%d\n", (_Bool) 3);
> 1
Or how is this helpful or useful in the C language?
What advantage does _Bool give?
The value of a _Bool is either 0 or 1. Nothing else, unlike an int.
Conversion to a _Bool always converts non-zero to 1 and only 0 to 0.
When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1.
Examples:
#include <math.h>
#include <stdlib.h>
_Bool all_false[] = { 0, 0.0, -0.0, NULL };
_Bool all_true[] = { 13, 0.1, 42.0, "Hello", NAN };
Notice the difference of conversion/casting to int vs; _Bool: (int) 0.1 --> 0, yet (_Bool) 0.1 --> 1.
Notice the difference of conversion/casting to unsigned vs; _Bool: (unsigned) 0x100000000 --> 0, yet (_Bool) 0x100000000 --> 1.
_Bool adds clarity to boolean operations.
_Bool is a distinctive type from int, char, etc. when used with _Generic.
Prior to C99, C lacked _Bool. Much early code formed their own types bool, Bool, boolean, bool8, bool_t, .... Creating a new type _Bool brought uniformity to this common, yet non-uniform practice. <stdbool.h> is available to use bool, true, false. This allows older code, which does not include <stdbool.h> to not break, yet newer code to use cleaner names.
OP's example with "doesn't enforce that a value is true/false or 1/0" does enforce that bools[2] had a value of 1. It did not enforce that the initializer of 'c', an int, had to be in the range of [0...1] nor of type _Bool, much like int x = 12.345; is allowed. In both cases, a conversion occurred. Although the 2nd often generates a warning.
The advantage is legibility, nothing more. For example:
bool rb() {
if (cond && f(y)) {
return true;
}
return false;
}
Versus:
int rb() {
if (cond && f(y)) {
return 1;
}
return 0;
}
There's really no other benefit to it. For those that are used to working in C code without bool, it's largely cosmetic, but for those used to C++ and its bool it may make coding feel more consistent.
As always, an easy way to "cast to a boolean value" is just double negation, like:
!!3
Where that will reduce it to a 0 or 1 value.
Consider this:
(bool) 0.5 -> 1
( int) 0.5 -> 0
As you can see, _Bool does not act like an integer.

Counter in C Language

I'm mainly an electronics hardware guy, but I've learned the basics of D from a friend, so I decided I'd pick up a more standard language like C. So, any advice would be great.
Basically, I'm using the integer 'phew' as a counter, to reverse all numbers. Super basic, but I'm having trouble finding the way to do this in C. My code:
#include <stdio.h>
int main()
{
int input;
int phew;
printf("Binary Number: ");
scanf("%d", &input);
while(phew < sizeof(input))
{
if(input[phew] == 0)
printf("1");
else
printf("0");
phew++;
}
return 0;
}
And the compiler error was:
helloworld.c: In function ‘main’:
helloworld.c:11:11: error: subscripted value is neither array nor pointer nor vector
if(input[phew] == 0)
^
First of all, in the below statement,
while(phew < sizeof(input))
invokes undefined behavior as phew is an automatic local variable and not initialized. You need to initialize phew like int phew = 0;
In this scenario, sizeof(input) is valid, but does not make any sense.
That said, you can only use the [] operator on an array type. In your case, input is an int, so you are not allowed to write input[n].
to elaboate, quoting the C11 standard, chapter §6.5.2.1
Syntax
postfix-expression [ expression ]
and the description
One of the expressions shall have type pointer to complete object type, the other
expression shall have integer type, and the result has type type.
So, clearly, in your case input is not a "pointer-to-type", hence the error.
To solve the issues,
You can change the type of input as char input[32] = {0};
change the scan statement as `scanf("%29s", input);
add while(phew < strlen(input))
to make sense. You'll be needing string.h header file for that. Check details about strlen() here.
You have to change it like
while(phew < strlen(input) )
{
if(input[phew] == '0') //ASCII 48, decimal
printf("1");
else
printf("0");
phew++;
}
You cannot use [] to get bits of an integer in C; that operator only works on pointer values and an int is not a pointer. You may be familiar with Verilog, where bit vectors and integers are the same thing; but C doesn't have a syntax for accessing bits of an object and [] is only for accessing members of an array.
One correct way to do this is using a bit mask. For example:
for (size_t i = 0; i < (sizeof input)*CHAR_BIT; i++) {
unsigned mask = 1 << i;
if (input & mask) {
putchar('1');
} else {
putchar('0');
}
}
putchar('\n');
What this code does on each iteration of the loop is create an unsigned integer mask by shifting the number 1 into the ith place of the integer. Then mask is bitwise-ANDed with input to give a value that is 0 if that bit is 0 in input (and otherwise some nonzero value equivalent to the place value of the bit). Don't forget to print out a newline at the end or the output may not be visible.
I'm using unsigned arithmetic here because << can have implementation-defined or undefined behavior on signed numbers in C. input can be a regular (signed) int because it will be promoted to unsigned by the & operator.
Also notice that I had to multiply sizeof input by CHAR_BIT (which is defined in <limits.h>) to get the number of bits; sizeof always measures sizes in bytes.
Also see this C FAQ (20.7) about bit manipulations.

Is TRUE/FALSE part of C standard?

TRUE/FALSE is usually defined in C as below. Are these definitions part of C standard which is supported by varies compiler implementations?
#define TRUE 1
#define FALSE 0
No for TRUE or FALSE. Yes for true and false in C99 or later if you include <stdbool.h>.
C99 and C11 define an integral type with Boolean semantics called _Bool, but no actual true/false keywords exist.
The _Bool type is only capable of storing the values 1 and 0. Any value that doesn't compare equal to 0 is converted to 1, and any value comparing equal to 0 is converted to 0.
By including <stdbool.h>, the _Bool type is allowed to be written as bool as with some other languages, and true and false C preprocessor macros are defined to be 1 and 0 respectively.
Before that, it was somewhat of a convenience for someone to define constants like
#define TRUE 1
#define FALSE 0
typedef char BOOL; /* or #define BOOL char */
or sometimes
typedef enum {False, True} BOOL;
Unlike C99, however, both have at least one flaw:
BOOL bClicked = False;
++bClicked, ++bClicked;
if (bClicked == False)
printf ("False\n");
else if (bClicked == True)
printf ("True\n");
else
printf ("Unknown: %d\n", bClicked);
That will print "Unknown: 2" because the defined BOOL type isn't a true Boolean type.
The C99 version:
_Bool bClicked = 0;
++bClicked, ++bClicked;
if (bClicked == 0)
printf ("False\n");
else if (bClicked == 1)
printf ("True\n");
else
printf ("Unknown: %d\n", bClicked);
That will print "True" because _Bool can only store 0 and 1, so incrementing 1 to get 2, which compares as not equal to 0, results in 1.
Of course, most people just use the language to their advantage rather than actually comparing against True/False constants:
if (bClicked)
{
/* True if bClicked does not compare equal to 0 */
}
else
{
/* False */
}
Because of that behavior, there isn't any real need for a Boolean type or true/false constants; they exist purely for indication of intent.
I vaguely remember someone ranting that some Windows API functions return a value of type BOOL, but TRUE and FALSE weren't the only possible return values, so despite returning a value that should have been a simple comparison as in that last bit of code, more comparisons were needed to handle all possible cases. Had there been an actual Boolean type back then, most likely those functions, whatever they are/were would have returned a value of type int instead. A BOOL return type suggests only two values can be returned, but apparently that wasn't the case with those functions, perhaps because there was a third (error) return value.

Resources