Clarifications about unsigned type in C

Clarifications about unsigned type in C - c

Hi I'm currently learning C and there's something that I quite don't understand.
First of all I was told that if I did this:
unsigned int c2 = -1;
printf("c2 = %u\n", c2);
It would output 255, according to this table:
But I get a weird result: c2 = 4294967295
Now what's weirder is that this works:
unsigned char c2 = -1;
printf("c2 = %d\n", c2);
But I don't understand because since a char is, well, a char why does it even print anything? Since the specifier here is %d and not %u as it should be for unsigned types.

The following code:
unsigned int c2 = -1;
printf("c2 = %u\n", c2);
Will never print 255. The table you are looking at is referring to an unsigned integer of 8 bits. An int in C needs to be at least 16 bits in order to comply with the C standard (UINT_MAX defined as 2^16-1 in paragraph §5.2.4.2.1, page 22 here). Therefore the value you will see is going to be a much larger number than 255. The most common implementations use 32 bits for an int, and in that case you'll see 4294967295 (2^32 - 1).
You can check how many bits are used for any kind of variable on your system by doing sizeof(type_or_variable) * CHAR_BIT (CHAR_BIT is defined in limits.h and represents the number of bits per byte, which is again most of the times 8).
The correct code to obtain 255 as output is:
unsigned char c = -1;
printf("c = %hhu\n", c);
Where the hh prefix specifier means (from man 3 printf):
hh: A following integer conversion corresponds to a signed char or unsigned char argument, or a following n conversion corresponds to a pointer to a signed char argument.
Anything else is just implementation defined or even worse undefined behavior.

In this declaration
unsigned char c2 = -1;
the internal representation of -1 is truncated to one byte and interpreted as unsigned char. That is all bits of the object c2 are set.
In this call
printf("c2 = %d\n", c2);
the argument that has the type unsigned char is promoted to the type int preserving its value that is 255. This value is outputted as an integer.
Is this declaration
unsigned int c2 = -1;
there is no truncation. The integer value -1 that usually occupies 4 bytes (according to the size of the type int) is interpreted as an unsigned value with all bits set.
So in this call
printf("c2 = %u\n", c2);
there is outputted the maximum value of the type unsigned int. It is the maximum value because all bits in the internal representation are set. The conversion from signed integer type to a larger unsigned integer type preserve the sign propagating it to the width of the unsigned integer object.

In C integer can have multiple representations, so multiple storage sizes and value ranges
refer to the table below for more details.

Related

How can I confirm the range of unsigned long integer in C?

unsigned long has 8 bytes on my Linux gcc.
unsigned long long has 8 bytes on my Linux gcc, too.
So I think the range of integers they can show is from 0 min to (2^64 - 1)max.
Now I want to confirm if I'm correct.
Here is my code:
#include <stdio.h>
int main(void)
{
printf("long takes up %d bytes:\n", sizeof(long));
printf("long long takes up %d bytes:\n", sizeof(long long));
unsigned long a = 18446744073709551615;
a++;
printf("a + 1 = %lu\n", a);
unsigned long long b = 18446744073709551615;
b++;
printf("b + 1 = %llu\n", b);
return 0;
}
However, the code cannot be compiled and I get the following warning:
warning: integer constant is so large that it is unsigned
Where did I do wrong? How can I modify the code ?

When you initialize num, you can append the "UL" for unsigned long and ULL for unsigned long long.
For example:
unsigned long a = 18446744073709551615UL;
unsigned long long b = 18446744073709551615ULL;
Also, use %zu instead of %d because sizeof return size_t.
According to cppreference:
integer-suffix, if provided, may contain one or both of the following (if both are provided, they may appear in any order:
unsigned-suffix (the character u or the character U)
long-suffix (the
character l or the character L) or the long-long-suffix (the character
sequence ll or the character sequence LL) (since C99)
C standard 5.2.4.2.1 Sizes of integer types <limits.h> :
1 The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for
CHAR_BIT and MB_LEN_MAX, the following shall be replaced by
expressions that have the same type as would an expression that is an
object of the corresponding type converted according to the integer
promotions. Their implementation-defined values shall be equal or
greater in magnitude (absolute value) to those shown, with the same
sign.

You find some useful definitions in <limits.h>.

Initialize unsigned numbers with -1. This will automatically be MAX value in C.
#include <stdio.h>
int main(void)
{
printf("long takes up %d bytes:\n", sizeof(long));
printf("long long takes up %d bytes:\n", sizeof(long long));
unsigned long a = -1;
printf("a = %lu\n", a);
unsigned long long b = -1;
printf("b = %llu\n", b);
return 0;
}
Update: Changed the code based on comments :)

How can I confirm the range of unsigned long integer in C?
Best, just use the macros from <limits.h>. It better self documents code's intent.
unsigned long long b_max = ULLONG_MAX;
Alternatively, assign -1 to the unsigned type. As -1 is not in the range of an unsigned type, it will get converted to the target type by adding the MAX value of that type plus 1. The works even on rare machines that have padding.
... if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. C11dr §6.3.1.3 2
The min values is of course 0 for an unsigned type.
unsigned long long b_min = 0;
unsigned long long b_max = -1;
printf("unsigned long long range [%llu %llu]\n", b_min, b_max);
Note that picky compilers will complain about assigning an out-of-range value with b_max = -1;. Use ULLONG_MAX.
Where did I do wrong?
The warning "warning: integer constant is so large that it is unsigned" is due to 18446744073709551615 is a integer decimal constant outside the long long range on your platform. Unadorned decimal constants are limited to that. Append a U or u. Then the compiler will consider unsigned long long.
unsigned long long b = 18446744073709551615u;
Further, there is no C spec that says 18446744073709551615 is the max value of unsigned long long. It must be at least that. It could be larger. So assigning b = 18446744073709551615u may not assign the max value.
How can I modify the code ?
Shown above

As rsp stated you can specify the type of the literal with UL and ULL.
But this won't lead to a conclusive result in your code for the arithmetics.
The value your print will always be 0 because
2^64 % 64 = 0 // 64 = 8 byte
2^64 % 32 = 0 // 32 = 4 byte
2^64 % 16 = 0 // 16 = 2 byte
as you can see the variable size always doubles so if you us a wrapping number for 8 bytes it just wraps multiple types on the smaller sizes and yields the same result.
The sizeof will show you the right values.
But generally you want to check for these things in code and not on output so you could use limits.h as suggested by Arndt Jonasson.
or you can use static_assert to check at compile time.

Converting non-Ascii characters to int in C, the extra bits are supplemented by 1 rather than 0

When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!

It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.

Left shifting operation on int

On IndiaBix.com I came across the following question.As per my experience level(beginner in C) the output of above should be 0 (10000000 << 1 is 00000000) but it came out to be 256,after going some deeper I found that we are printing using %d which supports 4 bytes so the output is 256 instead of 0.
#include<stdio.h>
int main()
{
unsigned char i = 128;
printf("%d \n", i << 1);
return 0;
}
Now Consider the following Example
#include<stdio.h>
int main()
{
unsigned int i = 2147483648;(bit 31 = 1 and b0 to b30 are 0)
printf("%d \n", i<<1);
return 0;
}
When I left shift the above I get 0 as the output, As %d supports value of int the output should be 0 but when I Changed %d to %ld the output is still 0. As %ld supports values upto long int the output should not be 0.Why i am getting 0 as the output.

In the first case, i is promoted to int, which can store at least 32767, then shift is calculated. As a result, the result became 256.
In the second case, if your unsigned int is 32-bit long, it is calculated in unsigned int and the result wraps. As a result, the result became 0.
You have to cast to larger type to get what you want.
#include<stdio.h>
#include<inttypes.h>
int main()
{
unsigned int i = 2147483648;//(bit 31 = 1 and b0 to b30 are 0)
printf("%"PRIu64" \n", (uint64_t)i<<1);
return 0;
}

Its not the %d or %ld that matters here.
Its probably because the size of unsigned int on your machine in 4 bytes.
Also you can't use %ld with unsigned int. It is undefined behavior.

The first problem is that 2147483648 (80000000h) will fit inside your 32 bit unsigned int, but it will not fit inside the signed int that printf expects when you use the %d specifier. Instead, use the %u specifier.
When that is fixed, note that 0x80000000 << 1 is supposed to be 0 if unsigned int is 32 bits.
Changing the format specifier of printf to %ld does not change the type of the expression! You need to change both the format specifier and the expression, if you want a larger type.
You are getting tricked by the behavior of the first char print. The reason why %d works there is because printf (like any variadic function) internally promotes all arguments to be at least the size of int. So the expression got implicitly promoted to int and as it happens, that matches with %d.
Though in the case of my_char << n, the << operator already integer promote both operands to int and the result of a shift always has the type of the possibly promoted left operand.
As you can tell, the various implicit type promotion rules in C aren't trivial and can therefore easily create bugs. It is one of many well-known flaws of the language.

According to the C Standard (6.5.7 Bitwise shift operators)
3 The integer promotions are performed on each of the operands.
The type of the result is that of the promoted left operand....
and (6.3.1.1 Boolean, characters, and integers)
...If an int can represent all values of the original type (as restricted > by the width, for a bit-field), the value is converted to
an int; otherwise, it is converted to an unsigned int. These are
called the integer promotions.58) All other types are unchanged by
the integer promotions.
This means that an operand of type unsigned char is promoted to type int because type int can represent all values of type unsigned char.
So in expression
i << 1
operand i is promoted to type int and you will get (let's assume that type int has 32 bits)
0x00000080 << 1
After the operation you will get result
0x00000100
that corresponds to decimal
256
and this statement
printf("%d \n", i << 1);
outputs this result.
Now consider this code snippet
unsigned int i = 2147483648;(bit 31 = 1 and b0 to b30 are 0)
printf("%d \n", i<<1);
Here i can be represented like
0x80000000
The integer promotions are applied to types that have the conversion rank less than the rank of type int.
So the integer promotions will not be applied to variable i because it does not has a rank less than the rank of int.
After the operations you will get
0x00000000
that is you will get decimal
0
and this statement
printf("%d \n", i<<1);
will correctly output this zero because its representation is the same for unsigned and signed integer objects.
Even if you write for example
printf( "%lld \n", ( long long int )( i<<1 ));
you will get the same result because the type of expression i << 1 in any case is unsigned int
However if you will write
printf( "%lld \n", ( long long int )i << 1 );
then operand i will be converted to type long long int and you will get
0x0100000000

Changing %d to %id will not be effective
Change datatype to unsigned it to that will help you to increase your integer limit

Whats wrong with this C code?

My sourcecode:
#include <stdio.h>
int main()
{
char myArray[150];
int n = sizeof(myArray);
for(int i = 0; i < n; i++)
{
myArray[i] = i + 1;
printf("%d\n", myArray[i]);
}
return 0;
}
I'm using Ubuntu 14 and gcc to compile it, what it prints out is:
1
2
3
...
125
126
127
-128
-127
-126
-125
...
Why doesn't it just count up to 150?

int value of a char can range from 0 to 255 or -127 to 127, depending on implementation.
Therefore once the value reaches 127 in your case, it overflows and you get negative value as output.

The signedness of a plain char is implementation defined.
In your case, a char is a signed char, which can hold the value of a range to -128 to +127.
As you're incrementing the value of i beyond the limit signed char can hold and trying to assign the same to myArray[i] you're facing an implementation-defined behaviour.
To quote C11, chapter §6.3.1.4,
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Because a char is a SIGNED BYTE. That means it's value range is -128 -> 127.
EDIT Due to all the below comment suggesting this is wrong / not the issue / signdness / what not...
Running this code:
char a, b;
unsigned char c, d;
int si, ui, t;
t = 200;
a = b = t;
c = d = t;
si = a + b;
ui = c + d;
printf("Signed:%d | Unsigned:%d", si, ui);
Prints: Signed:-112 | Unsigned:400
Try yourself
The reason is the same. a & b are signed chars (signed variables of size byte - 8bits). c & d are unsigned. Assigning 200 to the signed variables overflows and they get the value -56. In memory, a, b,c&d` all hold the same value, but when used their type "signdness" dictates how the value is used, and in this case it makes a big difference.
Note about standard
It has been noted (in the comments to this answer, as well as other answers) that the standard doesn't mandate that char is signed. That is true. However, in the case presented by OP, as well the code above, char IS signed.

It seems that your compiler by default considers type char like type signed char. In this case CHAR_MIN is equal to SCHAR_MIN and in turn equal to -128 while CHAR_MAX is equal to SCHAR_MAX and in turn equal to 127 (See header <limits.h>)
According to the C Standard (6.2.5 Types)
15 The three types char, signed char, and unsigned char are
collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char
For signed types one bit is used as the sign bit. So for the type signed char the maximum value corresponds to the following representation in the hexadecimal notation
0x7F
and equal to 127. The most significant bit is the signed bit and is equal to 0.
For negative values the signed bit is set to 1 and for example -128 is represented like
0x80
When in your program the value stored in char reaches its positive maximum 0x7Fand was increased it becomes equal to 0x80 that in the decimal notation is equal to -128.
You should explicitly use type unsigned char instead of the char if you want that the result of the program execution did not depend on the compiler settings.
Or in the printf statement you could explicitly cast type char to type unsigned char. For example
printf("%d\n", ( unsigned char )myArray[i]);
Or to compare results you could write in the loop
printf("%d %d\n", myArray[i], ( unsigned char )myArray[i]);

Range of unsigned char in C language

As per my knowledge range of unsigned char in C is 0-255. but when I executed the below code its printing the 256 as output. How this is possible? I have got this code from "test your C skill" book which say char size is one byte.
main()
{
unsigned char i = 0x80;
printf("\n %d",i << 1);
}

Because the operands to <<* undergo integer promotion. It's effectively equivalent to (int)i << 1.
* This is true for most operators in C.

Several things are happening.
First, the expression i << 1 has type int, not char; the literal 1 has type int, so the type of i is "promoted" to int, and 0x100 is well within the range of a signed integer.
Secondly, the %d conversion specifier expects its corresponding argument to have type int. So the argument is being interpreted as an integer.
If you want to print the numeric value of a signed char, use the conversion specifier %hhd. If you want to print the numeric value of an unsigned char, use %hhu.

For arithmetical operations, char is promoted to int before the operation is performed. See the standard for details. Simplified: the "smaller" type is first brought to the "larger" type before the operation is performed. For the shift-operators, the resulting type is that of the left side operand, while for e.g. + and other "combining" operators it is the larger of both, but at least int. The latter means that char and short (and their unsigned counterparts are always promoted to int with the result being int, too. (simplified, for details please read the standard)
Note also that %d takes an int argument, not a char.
Additional notes:
unsigned char has not necessarily the range 0..255. Check limits.h, you will find UCHAR_MAX there.
char and "byte" are synonymously used in the standard, but neither are necessarily 8 bits wide (just very likely for modern general purpose CPUs).

As others have already explained, the statement "printf("\n %d",i << 1);" does integer promotion. So the one right shifting of integer value 128 results in 256. You could try the following code to print the maximum value of "unsigned char". The maximum value of "unsigned char" has all bits set. So a bitwise NOT operation using "~" should give you the maximum ASCII value of 255.
int main()
{
unsigned char ch = ~0;
printf("ch = %d\n", ch);
return 0;
}
Output:-
M-40UT:Desktop$ ./a.out
ch = 255