C: gcc implicitly converts signed char to unsigned char and vice versa? - c

I'm trying to learn C at got stuck with datatype-sizes at the moment.
Have a look at this code snippet:
#include <stdio.h>
#include <limits.h>
int main() {
char a = 255;
char b = -128;
a = -128;
b = 255;
printf("size: %lu\n", sizeof(char));
printf("min: %d\n", CHAR_MIN);
printf("max: %d\n", CHAR_MAX);
}
The printf-output is:
size: 1
min: -128
max: 127
How is that possible? The size of char is 1 Byte and the default char seems to be signed (-128...127). So how can I assign a value > 127 without getting an overflow warning (which I get when I try to assign -128 or 256)? Is gcc automatically converting to unsigned char? And then, when I assign a negative value, does it convert back? Why does it do so? I mean, all this implicitness wouldn't make it easier to understand.
EDIT:
Okay, it's not converting anything:
char a = 255;
char b = 128;
printf("%d\n", a); /* -1 */
printf("%d\n", b); /* -128 */
So it starts counting from the bottom up. But why doesn't the compiler give me a warning? And why does it so, when I try to assign 256?

See 6.3.1.3/3 in the C99 Standard
... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
So, if you don't get a signal (if your program doesn't stop) read the documentation for your compiler to understand what it does.
gcc documents the behaviour ( in http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation ) as
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

how can I assign a value > 127
The result of converting an out-of-range integer value to a signed integer type is either an implementation-defined result or an implementation-defined signal (6.3.1.3/3). So your code is legal C, it just doesn't have the same behavior on all implementations.
without getting an overflow warning
It's entirely up to GCC to decide whether to warn or not about valid code. I'm not quite sure what its rules are, but I get a warning for initializing a signed char with 256, but not with 255. I guess that's because a warning for code like char a = 0xFF would normally not be wanted by the programmer, even when char is signed. There is a portability issue, in that the same code on another compiler might raise a signal or result in the value 0 or 23.
-pedantic enables a warning for this (thanks, pmg), which makes sense since -pedantic is intended to help write portable code. Or arguably doesn't make sense, since as R.. points out it's beyond the scope of merely putting the compiler into standard-conformance mode. However, the man page for gcc says that -pedantic enables diagnostics required by the standard. This one isn't, but the man page also says:
Some users try to use -pedantic to check programs for strict ISO C
conformance. They soon find that it does not do quite what they want:
it finds some non-ISO practices, but not all---only those for which
ISO C requires a diagnostic, and some others for which diagnostics
have been added.
This leaves me wondering what a "non-ISO practice" is, and suspecting that char a = 255 is one of the ones for which a diagnostic has been specifically added. Certainly "non-ISO" means more than just things for which the standard demands a diagnostic, but gcc obviously is not going so far as to diagnose all non-strictly-conforming code of this kind.
I also get a warning for initializing an int with ((long long)UINT_MAX) + 1, but not with UINT_MAX. Looks as if by default gcc consistently gives you the first power of 2 for free, but after that it thinks you've made a mistake.
Use -Wconversion to get a warning about all of those initializations, including char a = 255. Beware that will give you a boatload of other warnings that you may or may not want.
all this implicitness wouldn't make it easier to understand
You'll have to take that up with Dennis Ritchie. C is weakly-typed as far as arithmetic types are concerned. They all implicitly convert to each other, with various levels of bad behavior when the value is out of range depending on the types involved. Again, -Wconversion warns about the dangerous ones.
There are other design decisions in C that mean the weakness is quite important to avoid unwieldy code. For example, the fact that arithmetic is always done in at least an int means that char a = 1, b = 2; a = a + b involves an implicit conversion from int to char when the result of the addition is assigned to a. If you use -Wconversion, or if C didn't have the implicit conversion at all, you'd have to write a = (char)(a+b), which wouldn't be too popular. For that matter, char a = 1 and even char a = 'a' are both implicit conversions from int to char, since C has no literals of type char. So if it wasn't for all those implicit conversions either various other parts of the language would have to be different, or else you'd have to absolutely litter your code with casts. Some programmers want strong typing, which is fair enough, but you don't get it in C.

Simple solution :
see signed char can have value from -128 to 127 okey
so now when you are assigning 129 to any char value it will take
127(this is valid) + 2(this additional) = -127
(give char a=129 & print it value comes -127)
look char register can have value like..
...126,127,-128,-127,-126...-1,0,1,2....
which ever you will assign final value will come by this calculation ...!!

Related

Operator "<<= " : What does it it mean?

I need help solving this problem in my mind, so if anyone had a similar problem it would help me.
Here's my code:
char c=0xAB;
printf("01:%x\n", c<<2);
printf("02:%x\n", c<<=2);
printf("03:%x\n", c<<=2);
Why the program prints:
01:fffffeac
02:ffffffac
03:ffffffb0
What I expected to print, that is, what I got on paper is:
01:fffffeac
02:fffffeac
03:fffffab0
I obviously realized I didn't know what the operator <<= was doing, I thought c = c << 2.
If anyone can clarify this, I would be grateful.
You're correct in thinking that
c <<= 2
is equivalent to
c = c << 2
But you have to remember that c is a single byte (on almost all systems), it can only contain eight bits, while a value like 0xeac requires 12 bits.
When the value 0xeac is assigned back to c then the value will be truncated and the top bits will simply be ignored, leaving you with 0xac (which when promoted to an int becomes 0xffffffac).
<<= means shift and assign. It's the compound assignment version of c = c << 2;.
There's several problems here:
char c=0xAB; is not guaranteed to give a positive result, since char could be an 8 bit signed type. See Is char signed or unsigned by default?. In which case 0xAB will get translated to a negative number in an implementation-defined way. Avoid this bug by always using uint8_t when dealing with raw binary bytes.
c<<2 is subject to Implicit type promotion rules - specifically c will get promoted to a signed int. If the previous issue occured where your char got a negative value, c now holds a negative int.
Left-shifting negative values in C invokes undefined behavior - it is always a bug. Shifting signed operands in general is almost never correct.
%x isn't a suitable format specifier to print the int you ended up with, nor is it suitable for char.
As for how to fix the code, it depends on what you wish to achieve. It's recommended to cast to uint32 before shifting.

C: How to best handle unsigned integers modulo arithmetic ("wrap-around") when using unsigned operands in calculations

There was this range checking function that required two signed integer parameters:
range_limit(long int lower, long int upper)
It was called with range_limit(0, controller_limit). I needed to expand the range check to also include negative numbers up to the 'controller_limit' magnitude.
I naively changed the call to
range_limit(-controller_limit, controller_limit)
Although it compiled without warnings, this did not work as I expected.
I missed that controller_limit was unsigned integer.
In C, simple integer calculations can lead to surprising results. For example these calculations
0u - 1;
or more relevant
unsigned int ui = 1;
-ui;
result in 4294967295 of type unsigned int (aka UINT_MAX). As I understand it, this is due to integer conversion rules and modulo arithmetics of unsigned operands see here.
By definition, unsigned arithmetic does not overflow but rather "wraps-around". This behavior is well defined, so the compiler will not issue a warning (at least not gcc) if you use these expressions calling a function:
#include <stdio.h>
void f_l(long int li) {
printf("%li\n", li); // outputs: 4294967295
}
int main(void)
{
unsigned int ui = 1;
f_l(-ui);
return 0;
}
Try this code for yourself!
So instead of passing a negative value I passed a ridiculously high positive value to the function.
My fix was to cast from unsigned integer into int:
range_limit(-(int)controller_limit, controller_limit);
Obviously, integer modulo behavior in combination with integer conversion rules allows for subtle mistakes that are hard to spot especially, as the compiler does not help in finding these mistakes.
As the compiler does not emit any warnings and you can come across these kind of calculations any day, I'd like to know:
If you have to deal with unsigned operands, how do you best avoid the unsigned integers modulo arithmetic pitfall?
Note:
While gcc does not provide any help in detecting integer modulo arithmetic (at the time of writing), clang does. The compiler flag "-fsanitize=unsigned-integer-overflow" will enable detection of modulo arithmetic (using "-Wconversion" is not sufficient), however, not at compile time but at runtime. Try for yourself!
Further reading:
Seacord: Secure Coding in C and C++, Chapter 5, Integer Security
Using signed integers does not change the situation at all.
A C implementation is under no obligation to raise a run-time warning or error as a response to Undefined Behaviour. Undefined Behaviour is undefined, as it says; the C standard provides absolutely no requirements or guidance about the outcome. A particular implementation can choose any mechanism it sees fit in response to Undefined Behaviour, including explicitly defining the result. (If you rely on that explicit definition, your program is no longer portable to other compilers with different or undocumented behaviour. Perhaps you don't care.)
For example, GCC defines the result of out-of-bounds integer conversions and some bitwise operations in Implementation-defined behaviour section of its manual.
If you're worried about integer overflow (and there are lots of times you should be worried about it), it's up to you to protect yourself.
For example, instead of allowing:
unsigned_counter += 5;
to overflow, you could write:
if (unsigned_count > UINT_MAX - 5) {
/* Handle the error */
}
else {
unsigned_counter += 5;
}
And you should do that in cases where integer overflow will get you into trouble. A common example, which can (and has!) lead to buffer-overflow exploits, comes from checking whether a buffer has enough room for an addition:
if (buffer_length + added_length >= buffer_capacity) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
If buffer_length + added_length overflows -- in either signed or unsigned arithmetic -- the necessary reallocation (or failure) won't trigger and the memcpy will overwrite memory or segfault or do something else you weren't expecting.
It's easy to fix, so it's worth getting into the habit:
if (added_length >= buffer_capacity
|| buffer_length >= buffer_capacity - added_length) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
Another similar case where you can get into serious trouble is when you are using a loop and your increment is more than one.
This is safe:
for (i = 0; i < limit; ++i) ...
This could lead to an infinite loop:
for (i = 0; i < limit; i += 2) ...
The first one is safe -- assuming i and limit are the same type -- because i + 1 cannot overflow if i < limit. The most it can be is limit itself. But no such guarantee can be made about i + 2, since limit could be INT_MAX (or whatever is the maximum value for the integer type being used). Again, the fix is simple: compare the difference rather than the sum.
If you're using GCC and you don't care about full portability, you can use the GCC overflow-detection builtins to help you. They're also documented in the GCC manual.

Weird results with type conversion of literals and return values in C

Two questions regarding this simple code:
float foo(){
return 128.0;
}
int main(){
char x = (char) 128;
char y = (char) 128.0;
char z = (char) foo();
printf("x: %d \ny: %d \nz: %d", x, y, z);
}
My output:
x: -128
y: 127
z: -128
Why is 128.0 prevented from overflowing when converted to a char and why isn't it when it's the return-value of a function rather than a literal? I'm happy to really get into detail if an adequate answer requires it :)
(I'm using gcc-4.8.1 and don't use any options for compiling, just gcc file.c)
Your C implementation appears to have a signed eight-bit char type. 128 cannot be represented in this type.
In char x = (char) 128;, there is a conversion from the int value 128 to the char type. For this integer-to-integer conversion, C 2018 6.3.1.3 3 says:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
“Implementation-defined” means a conforming C implementation is required to document how it behaves for this conversion and must abide by that. Your implementation apparently wraps modulo 256, with the result that converting 128 to char produces −128.
In char y = (char) 128.0;, there is a conversion from the double value 128 to the char type. For this floating-point-to-integer conversion, C 2018 6.3.1.4 2 says:
If the value being converted is outside the range of values that can be represented, the behavior is undefined.
“Undefined” means the C standard does not impose any requirements; a C implementation does not have to document its behavior and does not have to abide by any particular rules. In particular, the behavior may be different in different situations.
We see this in that char y = (char) 128.0; and char z = (char) foo(); produced different results (127 and −128). A common reason for this is that, since a constant is used, the compiler may have evaluated (char) 128.0 itself using internal software that clamps out-of-range results to the limits of the type, so the out-of-range 128 resulted in the maximum-possible 127, and y was initialized to that. On the other hand, for char () foo(), the compiler may have generated instructions to perform the conversion at run-time, and those instructions behaved different from the internal compiler evaluation and wrapped modulo 256, producing −128.
Since these behaviors are not specified by the C standard or, likely, your compiler’s documentation, you should not rely on them.
result of y is an Undefined Behaviour
https://godbolt.org/z/3x68ze
As you see different compilers give different results.
z is also an Undefined Behaviour but without optimisations enabled even this very trivial example is evaluated. But if you enable optimizations the result of will the same (because compiler chooses the result).
https://godbolt.org/z/WaPE4r

Defending "U" suffix after Hex literals

There is some debate between my colleague and I about the U suffix after hexadecimally represented literals. Note, this is not a question about the meaning of this suffix or about what it does. I have found several of those topics here, but I have not found an answer to my question.
Some background information:
We're trying to come to a set of rules that we both agree on, to use that as our style from that point on. We have a copy of the 2004 Misra C rules and decided to use that as a starting point. We're not interested in being fully Misra C compliant; we're cherry picking the rules that we think will most increase efficiency and robustness.
Rule 10.6 from the aforementioned guidelines states:
A “U” suffix shall be applied to all constants of unsigned type.
I personally think this is a good rule. It takes little effort, looks better than explicit casts and more explicitly shows the intention of a constant. To me it makes sense to use it for all unsigned contants, not just numerics, since enforcing a rule doesn't happen by allowing exceptions, especially for a commonly used representation of constants.
My colleague, however, feels that the hexadecimal representation doesn't need the suffix. Mostly because we almost exclusively use it to set micro-controller registers, and signedness doesn't matter when setting registers to hex constants.
My Question
My question is not one about who is right or wrong. It is about finding out whether there are cases where the absence or presence of the suffix changes the outcome of an operation. Are there any such cases, or is it a matter of consistency?
Edit: for clarification; Specifically about setting micro-controller registers by assigning hexadecimal values to them. Would there be a case where the suffix could make a difference there? I feel like it wouldn't. As an example, the Freescale Processor Expert generates all register assignments as unsigned.
Appending a U suffix to all hexadecimal constants makes them unsigned as you already mentioned. This may have undesirable side-effects when these constants are used in operations along with signed values, especially comparisons.
Here is a pathological example:
#define MY_INT_MAX 0x7FFFFFFFU // blindly applying the rule
if (-1 < MY_INT_MAX) {
printf("OK\n");
} else {
printf("OOPS!\n");
}
The C rules for signed/unsigned conversions are precisely specified, but somewhat counter-intuitive so the above code will indeed print OOPS.
The MISRA-C rule is precise as it states A “U” suffix shall be applied to all constants of unsigned type. The word unsigned has far reaching consequences and indeed most constants should not really be considered unsigned.
Furthermore, the C Standard makes a subtile difference between decimal and hexadecimal constants:
A hexadecimal constant is considered unsigned if its value can be represented by the unsigned integer type and not the signed integer type of the same size for types int and larger.
This means that on 32-bit 2's complement systems, 2147483648 is a long or a long long whereas 0x80000000 is an unsigned int. Appending a U suffix may make this more explicit in this case but the real precaution to avoid potential problems is to mandate the compiler to reject signed/unsigned comparisons altogether: gcc -Wall -Wextra -Werror or clang -Weverything -Werror are life savers.
Here is how bad it can get:
if (-1 < 0x8000) {
printf("OK\n");
} else {
printf("OOPS!\n");
}
The above code should print OK on 32-bit systems and OOPS on 16-bit systems. To make things even worse, it is still quite common to see embedded projects use obsolete compilers which do not even implement the Standard semantics for this issue.
For your specific question, the defined values for micro-processor registers used specifically to set them via assignment (assuming these registers are memory-mapped), need not have the U suffix at all. The register lvalue should have an unsigned type and the hex value will be signed or unsigned depending on its value, but the operation will proceed the same. The opcode for setting a signed number or an unsigned number is the same on your target architecture and on any architectures I have ever seen.
With all integer-constants
Appending u/U insures the integer-constant will be some unsigned type.
Without a u/U
For a decimal-constant, the integer-constant will be some signed type.
For a hexadecimal/octal-constant, the integer-constant will be signed or unsigned type, depending of value and integer type ranges.
Note: All integer-constants have positive values.
// +-------- unary operator
// |+-+----- integer-constant
int x = -123;
absence or presence of the suffix changes the outcome of an operation?
When is this important?
With various expressions, the sign-ness and width of the math needs to be controlled and preferable not surprising.
// Examples: assume 32-bit `unsigned`, `long`, 64-bit `long long`
// Bad signed int overflow (UB)
unsigned a = 4000 * 1000 * 1000;
// OK
unsigned b = 4000u * 1000 * 1000;
// undefined behavior
unsigned c = 1 << 31
// OK
unsigned d = 1u << 31
printf("Size %zu\n", sizeof(0xFFFFFFFF)); // 8 type is `long long`
printf("Size %zu\n", sizeof(0xFFFFFFFFu)); // 4 type is `unsigned`
// 2 ** 63
long long e = -9223372036854775808; // C99: bad "9223372036854775808" not representable
long long f = -9223372036854775807 - 1; // ok
long long g = -9223372036854775808u; // implementation defined behavior **
some_unsigned_type h_max = -1; OK, max value for the target type.
some_unsigned_type i_max = -1u; OK, but not max value for wide unsigned types
// when negating a negative `int`
unsigned j = 0 - INT_MIN; // typically int overflow or UB
unsigned k = 0u - INT_MIN; // Never UB
** or an implementation-defined signal is raised.
For the specific question, which was loading register(s), then the U makes it an unsigned value, but whether the compiler treats the n-bit word pattern as a signed or unsigned value it will move the same bit pattern, assuming there isn't any size extension that would propagate an MSB. The difference that might matter is if the register load operation will set any processor condition flags based on a signed or unsigned loading. As an overall guide if the processor supports storing a constant to configuration register or a memory address then loading a peripheral register is unlikely to set the processor's NEG condition flag. Loading a general purpose register connected to an ALU, one that can be the target of an arithmetic operation like add increment or decrement, might set a negative flag on loading so that e.g. a trailing "branch (if) negative" opcode would execute the branch. You would want to check the processor's references to be sure. Small instruction set processors tend to have only a load register instruction, while larger instruction sets are more likely to have a load unsigned variant of the load instruction that doesn't set the NEG bit in the processor's flags, but again, check the processor's references. if you don't have access to the processor's errata (the boo-boo list) and need a specific flag state. All of this only tends to come up when an optimizing compiler re-aranges code with an inline assembly instruction and other uncommon situations. Examine the generate assembly code, turn off some or all compiler optimizations for the module when needed, etc.

How is {int i=999; char c=i;} different from {char c=999;}?

My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?
They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.
Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)
It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.
Aside from the fact that the first also defines an object iof type int, the semantics are identical.
i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.

Resources