Is char signed or unsigned by default? - c

In the book "Complete Reference of C" it is mentioned that char is by default unsigned.
But I am trying to verify this with GCC as well as Visual Studio. It is taking it as signed by default.
Which one is correct?

The book is wrong. The standard does not specify if plain char is signed or unsigned.
In fact, the standard defines three distinct types: char, signed char, and unsigned char. If you #include <limits.h> and then look at CHAR_MIN, you can find out if plain char is signed or unsigned (if CHAR_MIN is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.
Do note that char is special in this way. If you declare a variable as int it is 100% equivalent to declaring it as signed int. This is always true for all compilers and architectures.

As Alok points out, the standard leaves that up to the implementation.
For gcc, the default is signed, but you can modify that with -funsigned-char. note: for gcc in Android NDK, the default is unsigned. You can also explicitly ask for signed characters with -fsigned-char.
On MSVC, the default is signed but you can modify that with /J.

C99 N1256 draft 6.2.5/15 "Types" has this to say about the signed-ness of type char:
The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.
and in a footnote:
CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

According to The C Programming Language book by Dennis Ritchie which is the de-facto standard book for ANSI C, plain chars either signed or unsigned are machine dependent, but printable characters are always positive.

According to the C standard the signedness of plain char is "implementation defined".
In general implementors chose whichever was more efficient to implement on their architecture. On x86 systems char is generally signed. On arm systems it is generally unsigned (Apple iOS is an exception).

Now, we known the standard leaves that up to the implementation.
But how to check a type is signed or unsigned, such as char?
I wrote a macro to do this:
#define IS_UNSIGNED(t) ((t)~1 > 0)
and test it with gcc, clang, and cl. But I do not sure it's always safe for other cases.

Related

Printing char using integer format specifier [duplicate]

In the book "Complete Reference of C" it is mentioned that char is by default unsigned.
But I am trying to verify this with GCC as well as Visual Studio. It is taking it as signed by default.
Which one is correct?
The book is wrong. The standard does not specify if plain char is signed or unsigned.
In fact, the standard defines three distinct types: char, signed char, and unsigned char. If you #include <limits.h> and then look at CHAR_MIN, you can find out if plain char is signed or unsigned (if CHAR_MIN is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.
Do note that char is special in this way. If you declare a variable as int it is 100% equivalent to declaring it as signed int. This is always true for all compilers and architectures.
As Alok points out, the standard leaves that up to the implementation.
For gcc, the default is signed, but you can modify that with -funsigned-char. note: for gcc in Android NDK, the default is unsigned. You can also explicitly ask for signed characters with -fsigned-char.
On MSVC, the default is signed but you can modify that with /J.
C99 N1256 draft 6.2.5/15 "Types" has this to say about the signed-ness of type char:
The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.
and in a footnote:
CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.
According to The C Programming Language book by Dennis Ritchie which is the de-facto standard book for ANSI C, plain chars either signed or unsigned are machine dependent, but printable characters are always positive.
According to the C standard the signedness of plain char is "implementation defined".
In general implementors chose whichever was more efficient to implement on their architecture. On x86 systems char is generally signed. On arm systems it is generally unsigned (Apple iOS is an exception).
Now, we known the standard leaves that up to the implementation.
But how to check a type is signed or unsigned, such as char?
I wrote a macro to do this:
#define IS_UNSIGNED(t) ((t)~1 > 0)
and test it with gcc, clang, and cl. But I do not sure it's always safe for other cases.

char data type seems to be unsigned on raspberry pi 4 [duplicate]

In the book "Complete Reference of C" it is mentioned that char is by default unsigned.
But I am trying to verify this with GCC as well as Visual Studio. It is taking it as signed by default.
Which one is correct?
The book is wrong. The standard does not specify if plain char is signed or unsigned.
In fact, the standard defines three distinct types: char, signed char, and unsigned char. If you #include <limits.h> and then look at CHAR_MIN, you can find out if plain char is signed or unsigned (if CHAR_MIN is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.
Do note that char is special in this way. If you declare a variable as int it is 100% equivalent to declaring it as signed int. This is always true for all compilers and architectures.
As Alok points out, the standard leaves that up to the implementation.
For gcc, the default is signed, but you can modify that with -funsigned-char. note: for gcc in Android NDK, the default is unsigned. You can also explicitly ask for signed characters with -fsigned-char.
On MSVC, the default is signed but you can modify that with /J.
C99 N1256 draft 6.2.5/15 "Types" has this to say about the signed-ness of type char:
The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.
and in a footnote:
CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.
According to The C Programming Language book by Dennis Ritchie which is the de-facto standard book for ANSI C, plain chars either signed or unsigned are machine dependent, but printable characters are always positive.
According to the C standard the signedness of plain char is "implementation defined".
In general implementors chose whichever was more efficient to implement on their architecture. On x86 systems char is generally signed. On arm systems it is generally unsigned (Apple iOS is an exception).
Now, we known the standard leaves that up to the implementation.
But how to check a type is signed or unsigned, such as char?
I wrote a macro to do this:
#define IS_UNSIGNED(t) ((t)~1 > 0)
and test it with gcc, clang, and cl. But I do not sure it's always safe for other cases.

Assignment of a character to an unsigned int gives 3 all 1s bytes appended [duplicate]

In the book "Complete Reference of C" it is mentioned that char is by default unsigned.
But I am trying to verify this with GCC as well as Visual Studio. It is taking it as signed by default.
Which one is correct?
The book is wrong. The standard does not specify if plain char is signed or unsigned.
In fact, the standard defines three distinct types: char, signed char, and unsigned char. If you #include <limits.h> and then look at CHAR_MIN, you can find out if plain char is signed or unsigned (if CHAR_MIN is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.
Do note that char is special in this way. If you declare a variable as int it is 100% equivalent to declaring it as signed int. This is always true for all compilers and architectures.
As Alok points out, the standard leaves that up to the implementation.
For gcc, the default is signed, but you can modify that with -funsigned-char. note: for gcc in Android NDK, the default is unsigned. You can also explicitly ask for signed characters with -fsigned-char.
On MSVC, the default is signed but you can modify that with /J.
C99 N1256 draft 6.2.5/15 "Types" has this to say about the signed-ness of type char:
The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.
and in a footnote:
CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.
According to The C Programming Language book by Dennis Ritchie which is the de-facto standard book for ANSI C, plain chars either signed or unsigned are machine dependent, but printable characters are always positive.
According to the C standard the signedness of plain char is "implementation defined".
In general implementors chose whichever was more efficient to implement on their architecture. On x86 systems char is generally signed. On arm systems it is generally unsigned (Apple iOS is an exception).
Now, we known the standard leaves that up to the implementation.
But how to check a type is signed or unsigned, such as char?
I wrote a macro to do this:
#define IS_UNSIGNED(t) ((t)~1 > 0)
and test it with gcc, clang, and cl. But I do not sure it's always safe for other cases.

Is char possibly faster than unsigned char? [duplicate]

int main()
{
char c = 0xff;
bool b = 0xff == c;
// Under most C/C++ compilers' default options, b is FALSE!!!
}
Neither the C or C++ standard specify char as signed or unsigned, it is implementation-defined.
Why does the C/C++ standard not explicitly define char as signed or unsigned for avoiding dangerous misuses like the above code?
Historical reasons, mostly.
Expressions of type char are promoted to int in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). On some systems, sign extension is the most efficient way to do this, which argues for making plain char signed.
On the other hand, the EBCDIC character set has basic characters with the high-order bit set (i.e., characters with values of 128 or greater); on EBCDIC platforms, char pretty much has to be unsigned.
The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; section 3.1.2.5 says:
Three types of char are specified: signed, plain, and unsigned. A
plain char may be represented as either signed or unsigned, depending
upon the implementation, as in prior practice. The type signed char
was introduced to make available a one-byte signed integer type on
those systems which implement plain char as unsigned. For reasons of
symmetry, the keyword signed is allowed as part of the type name of
other integral types.
Going back even further, an early version of the C Reference Manual from 1975 says:
A char object may be used anywhere an int may be. In all cases the
char is converted to an int by propagating its sign through the upper
8 bits of the resultant integer. This is consistent with the two’s
complement representation used for both characters and integers.
(However, the sign-propagation feature disappears in other
implementations.)
This description is more implementation-specific than what we see in later documents, but it does acknowledge that char may be either signed or unsigned. On the "other implementations" on which "the sign-propagation disappears", the promotion of a char object to int would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. (The language didn't yet have the signed or unsigned keyword.)
C's immediate predecessor was a language called B. B was a typeless language, so the question of char being signed or unsigned did not apply. For more information about the early history of C, see the late Dennis Ritchie's home page, now moved here.
As for what's happening in your code (applying modern C rules):
char c = 0xff;
bool b = 0xff == c;
If plain char is unsigned, then the initialization of c sets it to (char)0xff, which compares equal to 0xff in the second line. But if plain char is signed, then 0xff (an expression of type int) is converted to char -- but since 0xff exceeds CHAR_MAX (assuming CHAR_BIT==8), the result is implementation-defined. In most implementations, the result is -1. In the comparison 0xff == c, both operands are converted to int, making it equivalent to 0xff == -1, or 255 == -1, which is of course false.
Another important thing to note is that unsigned char, signed char, and (plain) char are three distinct types. char has the same representation as either unsigned char or signed char; it's implementation-defined which one it is. (On the other hand, signed int and int are two names for the same type; unsigned int is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plain int is signed or unsigned.))
Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations.
char at first is meant to store characters, so whether it's signed or unsigned is not important. What really matters is how to perform maths on char efficiently. So depend on the system, the compiler will choose what's most appropriate
Prior to ARMv4, ARM had no native support for loading halfwords and signed bytes. To load a signed byte you had to LDRB then sign extend the value (LSL it up then ASR it back down). This is painful so char is unsigned by default.
why unsigned types are more efficent in arm cpu?
In fact a lot of ARM compilers still use unsigned char by default, because even if you can load a byte with sign extension on modern ARM ISAs, that instruction is still less flexible than the zero extension version
is char signed or unsigned by default on iOS?
char is unsigned by default on Android NDK
And most modern compilers also allow you to change char's signness instead of using the default setting

Declaring fixed-size integer typedef in Standard C

Is there a reliable way to declare typedefs for integer types of fixed 8,16,32, and 64 bit length in ISO Standard C?
When I say ISO Standard C, I mean that strictly:
ISO C89/C90, not C99.
No headers not defined in the ISO standard.
No preprocessor symbols not defined in the ISO standard.
No type-size assumptions not specified in the ISO standard.
No proprietary vendor symbols.
I see other questions similar to this in StackOverflow, but no answers yet that do not violate one of the above constraints. I'm not sure it's possible without resorting to platform symbols.
Yes you can.
The header file limits.h should be part of C90. Then I would test through preprocessor directives values of SHRT_MAX, INT_MAX, LONG_MAX, and LLONG_MAX and set typedefs accordingly.
Example:
#include <limits.h>
#if SHRT_MAX == 2147483647
typedef unsigned short int uint32_t;
#elif INT_MAX == 2147483647
typedef unsigned int uint32_t;
#elif LONG_MAX == 2147483647
typedef unsigned long uint32_t ;
#elif LLONG_MAX == 2147483647
typedef unsigned long long uint32_t;
#else
#error "Cannot find 32bit integer."
#endif
Strictly speaking, ISO 9899:1999 superceded ISO 9899:1990 so is the only current ISO standard C language specification.
As exact width typedef names for integer types were only introduced into the standard in the 1999 version, what you want is not possible using only the 1990 version of the standard.
There is none. There is a reliable way to declare individual integer variables up to 32 bits in size, however, if you're willing to live with some restrictions. Just use long bitfields (the latter is guaranteed to be at least 32-bit wide, and you're allowed to use up to as many bits in a bitfields as would fit in the variable if bitfield declarator was omitted). So:
struct {
unsigned long foo : 32;
} bar;
Obviously, you get all the limitations that come with that, such as inability to have pointers to such variables. The only thing this really buys you is guaranteed wraparound at the specified boundary on over/underflow, and even then only for unsigned types, since overflow is undefined for signed.
Aside from that, there's no portable way to do this in pure C90. Among other things, a conformant C90 implementation need not even have a 8-bit integer, for example - it would be entirely legal to have a platform in which sizeof(char) == sizeof(short) == sizeof(int) == 1 and CHAR_BIT == 16 (i.e. it has a 16-bit machine word, and cannot address individual bytes). I've heard that such platforms do in fact exist in practice in form of some DSPs.
A danger with such approaches when using modern compilers is that it has become fashionable for compilers to assume that a pointer of one integer type will not be used to access values of another, even when both types have the same size and representation. If two types have the same size and representation, and two parts of the same program each choose one of them, applying link-time optimizations to programs that share pointers to such data could result in improper behavior. For some implementations on many systems, there will be at least one size of integer for which it will be impossible to declare a pointer which can be safely used to access all integer values of that size; e.g. on systems where both long and long long are 64 bits, there will be no way to declare a pointer which can be used reliably to access data of either type interchangeably.
No, you can't do that.
Now, if you want to count a multi-stage configuration process like Gnu configure as a solution, you can do that and stick to C89. And there are certainly various types you can use that are in C89, and that will DTRT on almost every implementation that's around today, so you get the sizes you want and stick with pure conforming C89. But the bit widths, while what you want, will not in general be specified by the standard.

Resources