Whats wrong with this C code? - c

My sourcecode:
#include <stdio.h>
int main()
{
char myArray[150];
int n = sizeof(myArray);
for(int i = 0; i < n; i++)
{
myArray[i] = i + 1;
printf("%d\n", myArray[i]);
}
return 0;
}
I'm using Ubuntu 14 and gcc to compile it, what it prints out is:
1
2
3
...
125
126
127
-128
-127
-126
-125
...
Why doesn't it just count up to 150?

int value of a char can range from 0 to 255 or -127 to 127, depending on implementation.
Therefore once the value reaches 127 in your case, it overflows and you get negative value as output.

The signedness of a plain char is implementation defined.
In your case, a char is a signed char, which can hold the value of a range to -128 to +127.
As you're incrementing the value of i beyond the limit signed char can hold and trying to assign the same to myArray[i] you're facing an implementation-defined behaviour.
To quote C11, chapter §6.3.1.4,
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Because a char is a SIGNED BYTE. That means it's value range is -128 -> 127.
EDIT Due to all the below comment suggesting this is wrong / not the issue / signdness / what not...
Running this code:
char a, b;
unsigned char c, d;
int si, ui, t;
t = 200;
a = b = t;
c = d = t;
si = a + b;
ui = c + d;
printf("Signed:%d | Unsigned:%d", si, ui);
Prints: Signed:-112 | Unsigned:400
Try yourself
The reason is the same. a & b are signed chars (signed variables of size byte - 8bits). c & d are unsigned. Assigning 200 to the signed variables overflows and they get the value -56. In memory, a, b,c&d` all hold the same value, but when used their type "signdness" dictates how the value is used, and in this case it makes a big difference.
Note about standard
It has been noted (in the comments to this answer, as well as other answers) that the standard doesn't mandate that char is signed. That is true. However, in the case presented by OP, as well the code above, char IS signed.

It seems that your compiler by default considers type char like type signed char. In this case CHAR_MIN is equal to SCHAR_MIN and in turn equal to -128 while CHAR_MAX is equal to SCHAR_MAX and in turn equal to 127 (See header <limits.h>)
According to the C Standard (6.2.5 Types)
15 The three types char, signed char, and unsigned char are
collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char
For signed types one bit is used as the sign bit. So for the type signed char the maximum value corresponds to the following representation in the hexadecimal notation
0x7F
and equal to 127. The most significant bit is the signed bit and is equal to 0.
For negative values the signed bit is set to 1 and for example -128 is represented like
0x80
When in your program the value stored in char reaches its positive maximum 0x7Fand was increased it becomes equal to 0x80 that in the decimal notation is equal to -128.
You should explicitly use type unsigned char instead of the char if you want that the result of the program execution did not depend on the compiler settings.
Or in the printf statement you could explicitly cast type char to type unsigned char. For example
printf("%d\n", ( unsigned char )myArray[i]);
Or to compare results you could write in the loop
printf("%d %d\n", myArray[i], ( unsigned char )myArray[i]);

Related

typecasting unsigned char and signed char to int in C

int main()
{
char ch1 = 128;
unsigned char ch2 = 128;
printf("%d\n", (int)ch1);
printf("%d\n", (int)ch2);
}
The first printf statement outputs -128 and second 128. According to me both ch1 and ch2 will have same binary representation of the number stored: 10000000. So when I typecast both the values to integers how they end up being different value?
First of all, a char can be signed or unsigned and that depends on the compiler implementation. But, as you got different results. Then, your compiler treats char as signed.
A signed char can only hold values from -128 to 127. So, a value of 128 for signed char overflows to -128.
But an unsigned char can hold values from 0 to 255. So, a value of 128 remains the same.
An unsigned char can have a value of 0 to 255. A signed char can have a value of -128 to 127. Setting a signed char to 128 in your compiler probably wrapped around to the lowest possible value, which is -128.
Your fundamental error here is a misunderstanding of what a cast (or any conversion) does in C. It does not reinterpret bits. It's purely an operation on values.
Assuming plain char is signed, ch1 has value -128 and ch2 has value 128. Both -128 and 128 are representable in int, and therefore the cast does not change their value. (Moreover, writing it is redundant since the default promotions automatically convert variadic arguments of types lower-rank than int up to int.) Conversions can only change the value of an expression when the original value is not representable in the destination type.
For starters these castings
printf("%d\n", (int)ch1);
printf("%d\n", (int)ch2);
are redundant. You could just write
printf("%d\n", ch1);
printf("%d\n", ch2);
because due to the default argument promotions integer types with the rank that is less than the rank of the type int are promoted to the type int if an object of this type can represent the value stored in an object of an integer type with less rank.
The type char can behave either as the type signed char or unsigned char depending on compiler options.
From the C Standard (5.2.4.2.1 Sizes of integer types <limits.h>)
2 If the value of an object of type char is treated as a signed
integer when used in an expression, the value of CHAR_MIN shall be the
same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same
as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and
the value of CHAR_MAX shall be the same as that of UCHAR_MAX. 20) The
value UCHAR_MAX shall equal 2CHAR_BIT − 1.
So it seems by default the used compiler treats the type char as signed char.
As a result in the first declaration
char ch1 = 128;
unsigned char ch2 = 128;
the internal representation 0x80 of the value 128 was interpreted as a signed value because the sign bit is set. And this value is equal to -128.
So you got that the first call of printf outputted the value -128
printf("%d\n", (int)ch1);
while the second call of printf where there is used an object of the type unsigned char
printf("%d\n", (int)ch2);
outputted the value 128.

Converting non-Ascii characters to int in C, the extra bits are supplemented by 1 rather than 0

When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!
It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.

Execution output of program

Why does the below given program prints -128
#include <stdio.h>
main()
{
char i = 0;
for (; i >= 0; i++)
;
printf("%d",i);
}
Also can I assign int value to char without type-casting it. And if I used a print statement in for loop it prints till 127 which is correct but this program current prints -128. Why
If char is a signed type on your platform then the behaviour of the program is undefined: overflowing a signed type is undefined behaviour in C.
A 2's complement 8 bit number with a value of -128 has the same bit pattern as a unsigned 8 bit number with value +128. It seems that this is what is happening in your case. And -128 is, of course, a termination condition for your loop. (You could even call it "wraparound to the smallest negative"). But don't rely on this.
According to N1570, whether a char is signed is implementation-defined:(Emphasis mine)
6.2.5 Types
15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char.
If it's unsigned, it will never overflow:
9 The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same. A
computation involving unsigned operands can never overflow, because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting type.
For example, suppose UCHAR_MAX == 127(usually it'll be 255, though), 127 + 1 = (127 + 1) % (UCHAR_MAX + 1) = (127 + 1) % (127 + 1) = 0.
But if it's signed, the behavior is undefined, which means anything can happen. CHAR_MAX + 1 can be equal to CHAR_MIN, 0, or whatever. What's more, "undefined behavior" indicates that the program is possible to crash, although it's not very likely in practice.
In your case, it seems that char is signed, and CHAR_MAX + 1 == CHAR_MIN. Why? Just because your implementation defined so, and your are lucky enough to miss a crash this time. But this is not portable and reliable at all.

Byte to signed int in C

I have a binary value stored in a char in C, I want transform this byte into signed int in C.
Currently I have something like this:
char a = 0xff;
int b = a;
printf("value of b: %d\n", b);
The result in standard output will be "255", the desired output is "-1".
According to the C99 standard,
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
You need to cast your char to a signed char before assigning to int, as any value char could take is directly representable as an int.
#include <stdio.h>
int main(void) {
char a = 0xff;
int b = (signed char) a;
printf("value of b: %d\n", b);
return 0;
}
Quickly testing shows it works here:
C:\dev\scrap>gcc -std=c99 -oprint-b print-b.c
C:\dev\scrap>print-b
value of b: -1
Be wary that char is undefined by the C99 standard as to whether it is treated signed or unsigned.
6.2.5 Types
An object declared as type char is large enough to store any member of the basic
execution character set. If a member of the basic execution character set is stored in a
char object, its value is guaranteed to be positive. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.
...
The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char.
Replace:
char a = 0xff
by
signed char a = 0xff; // or more explicit: = -1
to have printf prints -1.
If you don't want to change the type of a, as #veer added in the comments you can simply cast a to (signed char) before assigning its value to b.
Note that in both cases, this integer conversion is implementation-defined but this is the commonly seen implementation-defined behavior.
You are already wrong from the start:
char a = 0xff;
if char is signed, which you seem to assume, here you already have a value that is out of range, 0xFF is an unsigned quantity with value 255. If you want to see char as signed numbers use signed char and assign -1 to it. If you want to see it as a bit pattern use unsigned char and assign 0xFF to it. Your initialization of the int will then do what you expect it to do.
char, signed char and unsigned char are by definition of the standard three different types. Reserve char itself to characters, printing human readable stuff.

C Programming on storage of bits in byte

#include<stdio.h>
int main()
{
char a = 128;
char b = -128;
printf("a is %d -- b is %d \n",a,b);
return 0;
}
The output is :
a is -128 -- b is -128
As the signed character range is from 0 to 127, from the above code can you please explain how the value is assigned for the out of boundary values.
Thanks in Advance.
The range of a char type depends on the implementation. If it is a signed type, then its range is at least from -128 to 127, and if it is an unsigned type its range is at least from 0 to 255 (these are the ranges that the type must support at a bare minimum, the range supported by the type may actually be larger than this depending on the implementation).
Also note, that when you assign an integer to a signed type that cannot hold that value, you are invoking undefined behaviour. So assigning 128 to a signed char that cannot hold 128 (e.g. when 128 is greater than CHAR_MAX) is invoking undefined behaviour. In this case, it has wrapped around to -128 because it shares the same byte representation as an unsigned char type holding 128, but as with all instances of undefined behaviour, you cannot guarantee that this will be the case on all implementations.

Resources