Increment operation behaviour on signed char in C - c

For signed char in C, the range is from -128 to 127. How can a+1 (as in the code), produce 128 and not -128? The subsequent increment is normal (-127 for both a and i).
#include<stdio.h>
int main()
{
char i=127,a=127;
printf("%d\n",++i);//output is -128
printf("%d\n",a+1);//output is 128
//subsequent increment after this is -127 for a and i.
return 0;
}

When used with arithmetic operators, all small integer types (bool, char, short) are subjected to integer promotions before any arithmetic is perfomed on them. In the a + 1 expression your a is promoted to int meaning that this expression is actually interpreted as (int) a + 1. The range of char plays no role here. All calculations are performed in the domain of int and the result has int type. 127 + 1 is 128, which is what you see printed.
Your ++i is defined as i = i + 1. For the same reasons the right-hand side is calculated as (int) i + 1, meaning that in this case the addition is performed in the domain of int as well. Later the result is converted back to char and stored back into i. The addition itself does not overflow and produces 128, but the subsequent conversion back to char "overflows", which produces implementation-defined behavior. In your case that implementation-defined behavior produced the value of -128 as char result.

Related

Is a char value set to CHAR_MAX guaranteed to wrap around to CHAR_MIN?

My code:
#include <stdio.h>
#include <limits.h>
int main()
{
char c = CHAR_MAX;
c += 1;
printf("CHAR_MIN=%d CHAR_MAX=%d c=%d (%c)\n", CHAR_MIN, CHAR_MAX, c, c);
}
Output:
CHAR_MIN=-128 CHAR_MAX=127 c=-128 ()
We see that when we increment a char variable set to CHAR_MAX, it wraps around to CHAR_MIN. Is this behavior guaranteed? Or is it going to be undefined behavior or implementation-specified behavior? What does the C99 standard say about this?
[Note: What happen when give value bigger than CHAR_MAX (127) to char or C- why char c=129 will convert into -127? does not address this question because they talk about assigning an out-of-range value not incrementing a value to an out-of-range value.]
The question is twofold: Firstly, is
char c = CHAR_MAX;
c += 1;
evaluated differently from
char c = CHAR_MAX;
c = c + 1;
and the answer is no it is not, because C11/C18 6.5.16.2p3:
A compound assignment of the form E1 op = E2 is equivalent to the simple assignment expression E1 = E1 op (E2) except that the lvalue E1 is evaluated only once, and with respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. If E1 has an atomic type, compound assignment is a read-modify-write operation with memory_order_seq_cst memory order semantics. 113)
Then, the question is what happens in c = c + 1. Here the operands to + undergo usual arithmetic conversions, and c and 1 are therefore promoted to int, unless a really wacky architecture requires that char is promoted to unsigned int. The calculation of + is then evaluated, and the result, of type int/unsigned int is converted back to char and stored in c.
There are 3 implementation-defined ways in which this can then be evaluated:
CHAR_MIN is 0 and therefore char is unsigned.
Either char is then promoted to int or unsigned int and if it is promoted to an int, then CHAR_MAX + 1 will necessarily fit into an int too, and will not overflow, or if unsigned int it may fit or wrap around to zero. When the resulting value, which is numerically either CHAR_MAX + 1 or 0 after modulo reduction, back to c, after modulo reduction it will become 0, i.e. CHAR_MIN
Otherwise char is signed, then if CHAR_MAX is smaller than INT_MAX, the result of CHAR_MAX + 1 will fit an int, and the standard C11/C18 6.3.1.3p3 applies to the conversion that happens upon assignment:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Or, iff sizeof (int) == 1 and char is signed, then char is promoted to an int, and CHAR_MAX == INT_MAX => CHAR_MAX + 1 will cause an integer overflow and the behaviour will be undefined.
I.e. the possible results are:
If char is an unsigned integer type, the result is always 0, i.e. CHAR_MIN.
Otherwise char is a signed integer type, and the behaviour is implementation-defined/undefined:
CHAR_MIN or some other implementation-defined value,
an implementation-defined signal is raised, possibly terminating the program,
or the behaviour is undefined on some platforms where sizeof (char) == sizeof (int).
All increment operations c = c + 1, c += 1, c++ and ++c have the same side effects on the same platform. The evaluated value of the expression c++ will be the value of c before the increment; for the other three, it will be the value of c after the increment.

Different Data Types - Signed and Unsigned

I just executed the following code
main()
{
char a = 0xfb;
unsigned char b = 0xfb;
printf("a=%c,b=%c",a,b);
if(a==b) {
printf("\nSame");
}
else {
printf("\nNot Same");
}
}
For this code I got the answer as
a=? b=?
Different
Why don't I get Same, and what is the value for a and b?
The line if (a == b)... promotes the characters to integers before comparison, so the signedness of the character affects how that happens. The unsigned character 0xFB becomes the integer 251; the signed character 0xFB becomes the integer -5. Thus, they are unequal.
There are 2 cases to consider:
if the char type is unsigned by default, both a and b are assigned the value 251 and the program will print Same.
if the char type is signed by default, which is alas the most common case, the definition char a = 0xfb; has implementation defined behavior as 0xfb (251 in decimal) is probably out of range for the char type (typically -128 to 127). Most likely the value -5 will be stored into a and a == b evaluates to 0 as both arguments are promoted to int before the comparison, hence -5 == 251 will be false.
The behavior of printf("a=%c,b=%c", a, b); is also system dependent as the non ASCII characters -5 and 251 may print in unexpected ways if at all. Note however that both will print the same as the %c format specifies that the argument is converted to unsigned char before printing. It would be safer and more explicit to try printf("a=%d, b=%d\n", a, b);
With gcc or clang, you can try recompiling your program with -funsigned-char to see how the behavior will differ.
According to the C Standard (6.5.9 Equality operators)
4 If both of the operands have arithmetic type, the usual arithmetic
conversions are performed....
The usual arithmetic conversions include the integer promotions.
From the C Standard (6.3.1.1 Boolean, characters, and integers)
2 The following may be used in an expression wherever an int or
unsigned int may be used:
...
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions.
So in this equality expression
a == b
the both operands are converted to the type int. The signed operand ( provided that the type char behaves as the type signed char) is converted to the type int by means of propagating the sign bit.
As result the operands have different values due to the difference in the binary representation.
If the type char behaves as the type unsigned char (for example by setting a corresponding option of the compiler) then evidently the operands will be equal.
char stores from -128 to 127 and unsigned char stores from 0 to 255.
and 0xfb represents 251 in decimal which is beyond the limit of char a.

Execution output of program

Why does the below given program prints -128
#include <stdio.h>
main()
{
char i = 0;
for (; i >= 0; i++)
;
printf("%d",i);
}
Also can I assign int value to char without type-casting it. And if I used a print statement in for loop it prints till 127 which is correct but this program current prints -128. Why
If char is a signed type on your platform then the behaviour of the program is undefined: overflowing a signed type is undefined behaviour in C.
A 2's complement 8 bit number with a value of -128 has the same bit pattern as a unsigned 8 bit number with value +128. It seems that this is what is happening in your case. And -128 is, of course, a termination condition for your loop. (You could even call it "wraparound to the smallest negative"). But don't rely on this.
According to N1570, whether a char is signed is implementation-defined:(Emphasis mine)
6.2.5 Types
15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char.
If it's unsigned, it will never overflow:
9 The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same. A
computation involving unsigned operands can never overflow, because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting type.
For example, suppose UCHAR_MAX == 127(usually it'll be 255, though), 127 + 1 = (127 + 1) % (UCHAR_MAX + 1) = (127 + 1) % (127 + 1) = 0.
But if it's signed, the behavior is undefined, which means anything can happen. CHAR_MAX + 1 can be equal to CHAR_MIN, 0, or whatever. What's more, "undefined behavior" indicates that the program is possible to crash, although it's not very likely in practice.
In your case, it seems that char is signed, and CHAR_MAX + 1 == CHAR_MIN. Why? Just because your implementation defined so, and your are lucky enough to miss a crash this time. But this is not portable and reliable at all.

Whats wrong with this C code?

My sourcecode:
#include <stdio.h>
int main()
{
char myArray[150];
int n = sizeof(myArray);
for(int i = 0; i < n; i++)
{
myArray[i] = i + 1;
printf("%d\n", myArray[i]);
}
return 0;
}
I'm using Ubuntu 14 and gcc to compile it, what it prints out is:
1
2
3
...
125
126
127
-128
-127
-126
-125
...
Why doesn't it just count up to 150?
int value of a char can range from 0 to 255 or -127 to 127, depending on implementation.
Therefore once the value reaches 127 in your case, it overflows and you get negative value as output.
The signedness of a plain char is implementation defined.
In your case, a char is a signed char, which can hold the value of a range to -128 to +127.
As you're incrementing the value of i beyond the limit signed char can hold and trying to assign the same to myArray[i] you're facing an implementation-defined behaviour.
To quote C11, chapter ยง6.3.1.4,
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Because a char is a SIGNED BYTE. That means it's value range is -128 -> 127.
EDIT Due to all the below comment suggesting this is wrong / not the issue / signdness / what not...
Running this code:
char a, b;
unsigned char c, d;
int si, ui, t;
t = 200;
a = b = t;
c = d = t;
si = a + b;
ui = c + d;
printf("Signed:%d | Unsigned:%d", si, ui);
Prints: Signed:-112 | Unsigned:400
Try yourself
The reason is the same. a & b are signed chars (signed variables of size byte - 8bits). c & d are unsigned. Assigning 200 to the signed variables overflows and they get the value -56. In memory, a, b,c&d` all hold the same value, but when used their type "signdness" dictates how the value is used, and in this case it makes a big difference.
Note about standard
It has been noted (in the comments to this answer, as well as other answers) that the standard doesn't mandate that char is signed. That is true. However, in the case presented by OP, as well the code above, char IS signed.
It seems that your compiler by default considers type char like type signed char. In this case CHAR_MIN is equal to SCHAR_MIN and in turn equal to -128 while CHAR_MAX is equal to SCHAR_MAX and in turn equal to 127 (See header <limits.h>)
According to the C Standard (6.2.5 Types)
15 The three types char, signed char, and unsigned char are
collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char
For signed types one bit is used as the sign bit. So for the type signed char the maximum value corresponds to the following representation in the hexadecimal notation
0x7F
and equal to 127. The most significant bit is the signed bit and is equal to 0.
For negative values the signed bit is set to 1 and for example -128 is represented like
0x80
When in your program the value stored in char reaches its positive maximum 0x7Fand was increased it becomes equal to 0x80 that in the decimal notation is equal to -128.
You should explicitly use type unsigned char instead of the char if you want that the result of the program execution did not depend on the compiler settings.
Or in the printf statement you could explicitly cast type char to type unsigned char. For example
printf("%d\n", ( unsigned char )myArray[i]);
Or to compare results you could write in the loop
printf("%d %d\n", myArray[i], ( unsigned char )myArray[i]);

Printing declared char value in C

I understand that character variable holds from (signed)-128 to 127 and (unsigned)0 to 255
char x;
x = 128;
printf("%d\n", x);
But how does it work? Why do I get -128 for x?
printf is a variadic function, only providing an exact type for the first argument.
That means the default promotions are applied to the following arguments, so all integers of rank less than int are promoted to int or unsigned int, and all floating values of rank smaller double are promoted to double.
If your implementation has CHAR_BIT of 8, and simple char is signed and you have an obliging 2s-complement implementation, you thus get
128 (literal) to -128 (char/signed char) to -128 (int) printed as int => -128
If all the listed condition but obliging 2s complement implementation are fulfilled, you get a signal or some implementation-defined value.
Otherwise you get output of 128, because 128 fits in char / unsigned char.
Standard quote for case 2 (Thanks to Matt for unearthing the right reference):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
This all has nothing to do with variadic functions, default argument promotions etc.
Assuming your system has signed chars, then x = 128; is performing an out-of-range assignment. The behaviour of this is implementation-defined ; meaning that the compiler may choose an action but it must document what it does (and therefore, do it reliably). This action is allowed to include raising a signal.
The usual behaviour that modern compilers do for out-of-range assignment is to truncate the representation of the value to fit in the destination type.
In binary representation, 128 is 000....00010000000.
Truncating this into a signed char gives the signed char of binary representation 10000000. In two's complement representation, which is used by all modern C systems for negative numbers, this is the representation of the value -128. (For historical curiousity: in one's complement this is -127, and in sign-magnitude, this is -0 which may be a trap representation and thus raise a signal).
Finally, printf accurately prints out this char's value of -128. The %d modifier works for char because of the default argument promotions and the facts that INT_MIN <= CHAR_MIN and INT_MAX >= CHAR_MAX.; this behaviour is guaranteed except on systems which have plain char as unsigned, and sizeof(int)==1 (which do exist but you'd know about it if you were on one).
Lets look at the binary representation of 128 when stored into 8 bits:
1000 0000
And now let's look at the binary representation of -128 when stored into 8 bits:
1000 0000
The standard for char with your current setup looks to be a signed char (note this isn't in the c standard, look here if you don't believe me) and thus when you're assigning the value of 128 to x you're assigning it the value 1000 0000 and thus when you compile and print it out it's printing out the signed value of that binary representation (meaning -128).
It turns out my environment is the same in assuming char is actually signed char. As expected if I cast x to be an unsigned char then I get the expected output of 128:
#include <stdio.h>
#include <stdlib.h>
int main() {
char x;
x = 128;
printf("%d %d\n", x, (unsigned char)x);
return 0;
}
gives me the output of -128 128
Hope this helps!

Resources