Difference between signed / unsigned char [duplicate] - c

This question already has answers here:
What is an unsigned char?
(16 answers)
char!=(signed char), char!=(unsigned char)
(4 answers)
Closed 5 years ago.
So I know that the difference between a signed int and unsigned int is that a bit is used to signify if the number if positive or negative, but how does this apply to a char? How can a character be positive or negative?

There's no dedicated "character type" in C language. char is an integer type, same (in that regard) as int, short and other integer types. char just happens to be the smallest integer type. So, just like any other integer type, it can be signed or unsigned.
It is true that (as the name suggests) char is mostly intended to be used to represent characters. But characters in C are represented by their integer "codes", so there's nothing unusual in the fact that an integer type char is used to serve that purpose.
The only general difference between char and other integer types is that plain char is not synonymous with signed char, while with other integer types the signed modifier is optional/implied.

I slightly disagree with the above. The unsigned char simply means: Use the most significant bit instead of treating it as a bit flag for +/- sign when performing arithmetic operations.
It makes significance if you use char as a number for instance:
typedef char BYTE1;
typedef unsigned char BYTE2;
BYTE1 a;
BYTE2 b;
For variable a, only 7 bits are available and its range is (-127 to 127) = (+/-)2^7 -1.
For variable b all 8 bits are available and the range is 0 to 255 (2^8 -1).
If you use char as character, "unsigned" is completely ignored by the compiler just as comments are removed from your program.

There are three char types: (plain) char, signed char and unsigned char. Any char is usually an 8-bit integer* and in that sense, a signed and unsigned char have a useful meaning (generally equivalent to uint8_t and int8_t). When used as a character in the sense of text, use a char (also referred to as a plain char). This is typically a signed char but can be implemented either way by the compiler.
* Technically, a char can be any size as long as sizeof(char) is 1, but it is usually an 8-bit integer.

Representation is the same, the meaning is different. e.g, 0xFF, it both represented as "FF". When it is treated as "char", it is negative number -1; but it is 255 as unsigned. When it comes to bit shifting, it is a big difference since the sign bit is not shifted. e.g, if you shift 255 right 1 bit, it will get 127; shifting "-1" right will be no effect.

A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short. An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short. A type char without a signed or unsigned qualifier may behave as either a signed or unsigned char; this is usually implementation-defined, but there are a couple of cases where it is not:
If, in the target platform's character set, any of the characters required by standard C would map to a code higher than the maximum `signed char`, then `char` must be unsigned.
If `char` and `short` are the same size, then `char` must be signed.
Part of the reason there are two dialects of "C" (those where char is signed, and those where it is unsigned) is that there are some implementations where char must be unsigned, and others where it must be signed.

The same way -- e.g. if you have an 8-bit char, 7 bits can be used for magnitude and 1 for sign. So an unsigned char might range from 0 to 255, whilst a signed char might range from -128 to 127 (for example).

This because a char is stored at all effects as a 8-bit number. Speaking about a negative or positive char doesn't make sense if you consider it an ASCII code (which can be just signed*) but makes sense if you use that char to store a number, which could be in range 0-255 or in -128..127 according to the 2-complement representation.
*: it can be also unsigned, it actually depends on the implementation I think, in that case you will have access to extended ASCII charset provided by the encoding used

The same way how an int can be positive or negative. There is no difference. Actually on many platforms unqualified char is signed.

Related

C and ASCII codes: how could 128 represents a char if it's range is from -128 to 127?

I'm referring this question because I can't understand how ASCII characters from 0 to 255 can be represented with a signed char if the range of it is from -128 to 127.
Being char = sizeof(char)= 1 byte, it is also reasonable to think that it can easily represent values up to the maximum of 255;
So why the assignment: char a = 128 has nothing wrong and also why shouldn't I use unsigned char for it.
Thank you in advance!
char c = 128; by itself is correct in C. The standard says that a char contains CHAR_BIT bits, which can be greater than 8. Also, a char can be signed or unsigned, implementation defined, and an unsigned char has to contain at least the range [0, 255].
So an implementation where a char is bigger than 8 bits, or the char is unsigned by default, this line is valid and relevant.
Even in a common 8 bit signed char implementation, the expression is still well-defined in how it will convert the 128 to fit in a char, so there is no problem.
In real cases, the compiler will often issue a warning for these, clang for example :
warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion].
signed or unsigned - it takes 8bits. 8bits can contain 256 values. Just question how we use them.

Output of C code in which character is assigned an octal number

#include<stdio.h>
int main(void)
{
char a = 01212;
printf("%d",a);
return 0;
}
On compiling i get a warning and output -118 how? I know any number starting with 0 in c is considered as octal. The octal equivalent of 01212 is 650 then why the output is -118?
The assignment char a = 01212; on most of the systems is out of range and implementation dependent. A system with 8-bit char that implement 2's complement will print -118.
For detail, please read below explanation.
Unlike integer a char is not signed by default; there are three different char types in C.
char,
signed char
and
unsigned char
A char has a range from CHAR_MIN to CHAR_MAX. For a particular compiler, the char will use either an underlying signed or unsigned representation. You can check this value in limits.h of your system.
Here is the text from C99 standard point number 15
6.2.5 Types
The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char.35)
And again note 35 says
35) CHAR_MIN, defined in , will have one of the values 0 or SCHAR_MIN, and this can be
used to distinguish the two options. Irrespective of the choice made, char is a separate type from the
other two and is not compatible with either.
Having said this char a = 01212; is larger than 8 bit. The C standard allows char size more than 8 bit but I think almost all computers in today's world implement 8 Bit char.
So if char is implemented as unsigned char and the value is more than CHAR_MAX the value will be converted according to Modulo CHAR_MAX+1.
In 8 bit system, the converted value is 650 modulo 256 which is 650-512 = 138
If char is implemented as signed char the conversion is implementation dependent. If it's an 8-bit char system and it implements 2's complement the value will be -118 as you have seen in your result. Note that in this system the Range for char will be from -128 to +127.
The value of 650 is most likely out-of-range for your char type. In C the behavior is implementation-defined in such cases as out-of-range integer conversions. I.e. it is clear that you will not get 650 in your char, and what exactly you will get depends on your compiler. Consult your compiler documentation to figure out why you got -118.
Char is only occupying one byte, or 8 bits, so the maximum number a unsigned char can hold is 2^8 - 1, which is 255, and a signed char has a maximum of 127. When assigned a number that's greater than that, it would cause undefined behavior, in which a negative number may appear.

What does (int)(unsigned char)(x) do in C?

In ctype.h, line 20, __ismask is defined as:
#define __ismask(x) (_ctype[(int)(unsigned char)(x)])
What does (int)(unsigned char)(x) do? I guess it casts x to unsigned char (to retrieve the first byte only regardless of x), but then why is it cast to an int at the end?
(unsigned char)(x) effectively computes an unsigned char with the value of x % (UCHAR_MAX + 1). This has the effect of giving a positive value (between 0 and UCHAR_MAX). With most implementations UCHAR_MAX has a value of 255 (although the standard permits an unsigned char to support a larger range, such implementations are uncommon).
Since the result of (unsigned char)(x) is guaranteed to be in the range supported by an int, the conversion to int will not change value.
Net effect is the least significant byte, with a positive value.
Some compilers give a warning when using a char (signed or not) type as an array index. The conversion to int shuts the compiler up.
The unsigned char-cast is to make sure the value is within the range 0..255, the resulting value is then used as an index in the _ctype array which is 255 bytes large, see ctype.h in Linux.
A cast to unsigned char safely extracts the least significant CHAR_BITs of x, due to the wraparound properties of an unsigned type. (A cast to char could be undefined if a char is a signed type on a platform: overflowing a signed type is undefined behaviour in c). CHAR_BIT is usually 8.
The cast to int then converts the unsigned char. The standard guarantees that an int can always hold any value that unsigned char can take.
A better alternative, if you wanted to extract the 8 least significant bits would be to apply & 0xFF and cast that result to an unsigned type.
I think char is implementation dependent, either signed or unsigned. So you need to be explicit by writing unsigned char, in order not to cast to a negative number. Then cast to int.

Unsigned char. C

So, where can unsigned char be useful?
If I understood right, unsigned char can represent numbers from -128 to 127. But every encoding table uses positive numbers. So, unsigned char can't be used for representing characters. Am I right?
No, unsigned char is 0 to 255.
It can be useful in representing binary data (a single byte), although, like any primitive data type, the possibilities are endless.
First of all, what you are representing is signed char, unsigned char ranges from 0 - 255.
To answer your questions about negative valued character, you are right that character encoding is done using positive values.
On a different view, just think of signed and unsigned char as integer representation.
Unsigned char is used to represent bytes. If you need just one byte of memory in a variable, you use unsigned char and assign an integer to it.
fo example, there is used uint8_t to represent bytes, but is not more than that.
A signed char can represent number from -128 to +127
and unsigned char is from 0 to 255.
Altough unsigned is more convenient in many use cases,
everthing binary-related can be done with signed too:
0=0, 1=1 ... 127=127, -128=128, -127=129, -126=130 ... -1=255
Such conversions happens automatically (or, better to say,
it´s just different interpretation).
("binary-related" means that a mathematical -2 * 2 would be possible too with unsigned,
but make even less sense)
Regarding So, where can unsigned char be useful?
Here perhaps?: (a very simple example to test for ASCII digit)
BOOL isDigit(unsigned char c)
{
if((c >= '0') &&(c <= '9')) return TRUE;
return FALSE;
}
By virtue of argument type unsigned char guarantees input will be a single ASCII character (there are 128 encoded ASCII possibilities, with Extended ASCII, there are 255 possibilities). So, in this function, all that remains is to test input value for specific criteria (in this case is it a digit) There is no requirement for function to test for negative numbers. A regular char (i.e. signed) cannot contain the entire range of ASCII characters. The sizeof unsigned char is also significant in that it is only 1 byte as opposed to 4 bytes (typically, but not always) for say, an int

What is the need for signed and unsigned characters in C

What is the need for signed and unsigned characters in C?
Is there some special reason for having a signed and unsigned char in C? Or was it simply added for completeness so that the compiler does not have to check the data type before adding signed/unsigned modifier?
I am not asking about signed and unsigned variables. My doubt is about the special cases where an unsigned character variable will not be sufficient such that you have to depend on a signed character variable.
A char can be either signed or unsigned depending on what is most efficient for the underlying hardware. The keywords signed and unsigned allow you to explicitly specify that you want something else.
A quote from the C99 rationale:
Three types of char are specified: signed, plain, and unsigned. A plain char may be represented as either signed or unsigned depending upon the implementation, as in prior practice. The type signed char was introduced in C89 to make available a one-byte signed integer type on those systems which implemented plain char as unsigned char. For reasons of symmetry, the keyword signed is allowed as part of the type name of other integer types.
Information #1: char in C is just a small int, which uses 8 bits.
Information #2: Difference between signed and unsigned, is that one bit in the representation is used as the sign bit for a signed variable.
Information #3: As a result of (#2), signed variables hold different ranges (-128 to 127, in char case) compared to unsigned (0 to 255 in char case).
Q-A #1: why do we need unsigned?
In most cases (for instance representing a pointer) we do not need signed variables. By convention all locations in the memory are exposed to the program as a contiguous array of unsigned addresses.
Q-A #2: why do we need signed?
Generally, to do signed arithmetic.
I assume you are using a char to hold numbers, not characters.
So:
signed char gives you at least the -128 to 127 range.
unsigned char gives you at least the 0 to 255 range.
A char is required by standard to be AT LEAST 8 bits, so that is the reason for my saying at least. It is possible for these values to be larger.
Anyway, to answer your question, having a char as unsigned frees the requirement for the first bit to be the 'sign' bit, thus allowing you to hold near double that of a signed char.
The thing you have to understand is that datatype "char" is actually just an integer, typically 8-bits wide. You can use it like any other inter datatype, assuming you respect the reduced value limits. There is no reason to limit "char" to characters.
On a 32/64-bit processor, there is typically no need to use such small integer fields, but on an 8-bit processor such as the 8051, 8-bit integers are no only much faster to process and use less (limited) memory.

Resources