What is the need for signed and unsigned characters in C - c

What is the need for signed and unsigned characters in C?
Is there some special reason for having a signed and unsigned char in C? Or was it simply added for completeness so that the compiler does not have to check the data type before adding signed/unsigned modifier?
I am not asking about signed and unsigned variables. My doubt is about the special cases where an unsigned character variable will not be sufficient such that you have to depend on a signed character variable.

A char can be either signed or unsigned depending on what is most efficient for the underlying hardware. The keywords signed and unsigned allow you to explicitly specify that you want something else.
A quote from the C99 rationale:
Three types of char are specified: signed, plain, and unsigned. A plain char may be represented as either signed or unsigned depending upon the implementation, as in prior practice. The type signed char was introduced in C89 to make available a one-byte signed integer type on those systems which implemented plain char as unsigned char. For reasons of symmetry, the keyword signed is allowed as part of the type name of other integer types.

Information #1: char in C is just a small int, which uses 8 bits.
Information #2: Difference between signed and unsigned, is that one bit in the representation is used as the sign bit for a signed variable.
Information #3: As a result of (#2), signed variables hold different ranges (-128 to 127, in char case) compared to unsigned (0 to 255 in char case).
Q-A #1: why do we need unsigned?
In most cases (for instance representing a pointer) we do not need signed variables. By convention all locations in the memory are exposed to the program as a contiguous array of unsigned addresses.
Q-A #2: why do we need signed?
Generally, to do signed arithmetic.

I assume you are using a char to hold numbers, not characters.
So:
signed char gives you at least the -128 to 127 range.
unsigned char gives you at least the 0 to 255 range.
A char is required by standard to be AT LEAST 8 bits, so that is the reason for my saying at least. It is possible for these values to be larger.
Anyway, to answer your question, having a char as unsigned frees the requirement for the first bit to be the 'sign' bit, thus allowing you to hold near double that of a signed char.

The thing you have to understand is that datatype "char" is actually just an integer, typically 8-bits wide. You can use it like any other inter datatype, assuming you respect the reduced value limits. There is no reason to limit "char" to characters.
On a 32/64-bit processor, there is typically no need to use such small integer fields, but on an 8-bit processor such as the 8051, 8-bit integers are no only much faster to process and use less (limited) memory.

Related

Difference between uint8_t and unsigned char

I'm using mplabX 4.20, and xc8 compiler. I'm trying to understand which is the difference between uint8_t and unsigned char. Both of them have size from 0 till 255.
Both of can hold characters and numbers. But which is better to use, and for which case?
Example if i want to create a buffer for holding a string.
uint8_t buffer[20]="Hello World";
unsigned char buffer[20]="Hello World";
In most cases i need to hold characters. Which is the best practise for this action?
I'm using mplabX 4.20, and xc8 compiler. I'm trying to understand
which is the difference between uint8_t and unsigned char. Both of
them have size from 0 till 255. Both of can hold characters and
numbers. But which is better to use, and for which case?
unsigned char is the unsigned integer type corresponding to signed char. Its representation does not use any padding bits. Both of these occupy the same amount of storage as type char, which is at least 8 bits, but may be more. The macro CHAR_BIT tells you how many it comprises in your implementation. Every conforming C implementation provides all of these types.
uint8_t, if available, is an unsigned integer data type exactly 8 bits wide and with no padding bits. On an implementation having CHAR_BIT defined as 8, this is the same type as unsigned char. On such systems you may use the two types interchangeably wherever the declarations provided by stdint.h are in scope. On other systems, uint8_t will not be declared at all.
Example if i want to create a buffer for holding a string.
If you want to declare a buffer for holding a string then as a matter of style, you should use type char, not either of the other two:
char buffer[20] = "Hello World";
Although either of the other two, or signed char, can also be used for string data (provided in the case of uint8_t that the type is defined at all), type char is the conventional one to use for character data. Witness, for example, that that's the type in terms of which all the string.h functions are declared.
You should use uint8_t where and only where you need an integer type with exactly its properties: unsigned, 8 value bits, no padding bits.
You should use unsigned char where you want the smallest unsigned integer type available, but you don't care whether it is exactly 8 bits wide, or where you want to emphasize that it is the same size as a char -- the smallest discrete unit of storage available.
You should use signed char where you want the smallest signed integer type available but don't care about the exact size or representation.
You should use int8_t where you want a signed integer type with exactly 7 value bits, one sign bit, and no padding bits, expressed in two's complement representation.
You should remain mindful that uint8_t and int8_t are not guaranteed to be available from every C implementation, and that where they are available, their use requires inclusion of stdint.h. Furthermore, this header and these types were not part of C90 at all, so you should not use them if compatibility with legacy C implementations is important to you.
difference between uint8_t and unsigned char
If you're on a some exotic system where CHAR_BIT > 8, then uint8_t isn't going to be defined at all.
Otherwise (if CHAR_BIT == 8) there is no difference between unsigned char and uint8_t.
i need to hold characters
Then use plain char.
Functions that operate in strings usually have [const]char * parameters, and you won't be able to pass your unsigned char arrays to them.

assigning 128 to char variable in c

The output comes to be the 32-bit 2's complement of 128 that is 4294967168. How?
#include <stdio.h>
int main()
{
char a;
a=128;
if(a==-128)
{
printf("%u\n",a);
}
return 0;
}
Compiling your code with warnings turned on gives:
warning: overflow in conversion from 'int' to 'char' changes value from '128' to '-128' [-Woverflow]
which tell you that the assignment a=128; isn't well defined on your plat form.
The standard say:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
So we can't know what is going on as it depends on your system.
However, if we do some guessing (and note this is just a guess):
128 as 8 bit would be 0b1000.0000
so when you call printf where you get a conversion to int there will be a sign extension like:
0b1000.0000 ==> 0b1111.1111.1111.1111.1111.1111.1000.0000
which - printed as unsigned represents the number 4294967168
The sequence of steps that got you there is something like this:
You assign 128 to a char.
On your implementation, char is signed char and has a maximum value of 127, so 128 overflows.
Your implementation interprets 128 as 0x80. It uses two’s-complement math, so (int8_t)0x80 represents (int8_t)-128.
For historical reasons (relating to the instruction sets of the DEC PDP minicomputers on which C was originally developed), C promotes signed types shorter than int to int in many contexts, including variadic arguments to functions such as printf(), which aren’t bound to a prototype and still use the old argument-promotion rules of K&R C instead.
On your implementation, int is 32 bits wide and also two’s-complement, so (int)-128 sign-extends to 0xFFFFFF80.
When you make a call like printf("%u", x), the runtime interprets the int argument as an unsigned int.
As an unsigned 32-bit integer, 0xFFFFFF80 represents 4,294,967,168.
The "%u\n" format specifier prints this out without commas (or other separators) followed by a newline.
This is all legal, but so are many other possible results. The code is buggy and not portable.
Make sure you don’t overflow the range of your type! (Or if that’s unavoidable, overflow for unsigned scalars is defined as modular arithmetic, so it’s better-behaved.) The workaround here is to use unsigned char, which has a range from 0 to (at least) 255, instead of char.
First of all, as I hope you understand, the code you've posted is full of errors, and you would not want to depend on its output. If you were trying to perform any of these manipulations in a real program, you would want to do so in some other, more well-defined, more portable way.
So I assume you're asking only out of curiosity, and I answer in the same spirit.
Type char on your machine is probably a signed 8-bit quantity. So its range is from -128 to +127. So +128 won't fit.
When you try to jam the value +128 into a signed 8-bit quantity, you probably end up with the value -128 instead. And that seems to be what's happening for you, based on the fact that your if statement is evidently succeeding.
So next we try to take the value -128 and print it as if it was an unsigned int, which on your machine is evidently an 32-bit type. It can hold numbers in the range 0 to 4294967295, which obviously does not include -128. But unsigned integers typically behave pretty nicely modulo their range, so if we add 4294967296 to -128 we get 4294967168, which is precisely the number you saw.
Now that we've worked through this, let's resolve in future not to jam numbers that won't fit into char variables, or to print signed quantities with the %u format specifier.

what is the use of signed char and unsigned char

since C language using the char as integer internally(correspondent ASCII is stored). for internal calculation we can use signed and unsigned char.
other than that, any other use??
signed and unsigned char are first and foremost just small integers. Do you need to store a large quantity of small numbers (in the range [-127, +127]¹ or [0, 255])? You can use an array of signed or unsigned chars and save memory compared to pretty much any other type. That's what is done for e.g. images - a grayscale image is generally stored as an array of unsigned char (and an RGB image is generally stored as an array of 3 unsigned char components).
The second usage of char is for character strings, which you probably already saw; notice that char is a distinct type from both signed char and unsigned char, and its signedness is implementation defined. This is stupid and inconvenient in many situations - and leads to sad stuff such as the mandatory cast to unsigned char when calling functions of the toupper/isupper family.
Finally, char & co. are defined as the "underlying storage" of the C abstract machine. sizeof(char) == 1 by definition, and any type can be aliased through a (signed|unsigned)? char pointer to access its underlying bit representation.
Yes, -127; [-127, +127] is the minimum range allowed for signed char by the standard, as it still allows sign and magnitude representation; more realistic, on any real-world machine of this century it will be at least [-128, 127].

Difference between signed / unsigned char [duplicate]

This question already has answers here:
What is an unsigned char?
(16 answers)
char!=(signed char), char!=(unsigned char)
(4 answers)
Closed 5 years ago.
So I know that the difference between a signed int and unsigned int is that a bit is used to signify if the number if positive or negative, but how does this apply to a char? How can a character be positive or negative?
There's no dedicated "character type" in C language. char is an integer type, same (in that regard) as int, short and other integer types. char just happens to be the smallest integer type. So, just like any other integer type, it can be signed or unsigned.
It is true that (as the name suggests) char is mostly intended to be used to represent characters. But characters in C are represented by their integer "codes", so there's nothing unusual in the fact that an integer type char is used to serve that purpose.
The only general difference between char and other integer types is that plain char is not synonymous with signed char, while with other integer types the signed modifier is optional/implied.
I slightly disagree with the above. The unsigned char simply means: Use the most significant bit instead of treating it as a bit flag for +/- sign when performing arithmetic operations.
It makes significance if you use char as a number for instance:
typedef char BYTE1;
typedef unsigned char BYTE2;
BYTE1 a;
BYTE2 b;
For variable a, only 7 bits are available and its range is (-127 to 127) = (+/-)2^7 -1.
For variable b all 8 bits are available and the range is 0 to 255 (2^8 -1).
If you use char as character, "unsigned" is completely ignored by the compiler just as comments are removed from your program.
There are three char types: (plain) char, signed char and unsigned char. Any char is usually an 8-bit integer* and in that sense, a signed and unsigned char have a useful meaning (generally equivalent to uint8_t and int8_t). When used as a character in the sense of text, use a char (also referred to as a plain char). This is typically a signed char but can be implemented either way by the compiler.
* Technically, a char can be any size as long as sizeof(char) is 1, but it is usually an 8-bit integer.
Representation is the same, the meaning is different. e.g, 0xFF, it both represented as "FF". When it is treated as "char", it is negative number -1; but it is 255 as unsigned. When it comes to bit shifting, it is a big difference since the sign bit is not shifted. e.g, if you shift 255 right 1 bit, it will get 127; shifting "-1" right will be no effect.
A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short. An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short. A type char without a signed or unsigned qualifier may behave as either a signed or unsigned char; this is usually implementation-defined, but there are a couple of cases where it is not:
If, in the target platform's character set, any of the characters required by standard C would map to a code higher than the maximum `signed char`, then `char` must be unsigned.
If `char` and `short` are the same size, then `char` must be signed.
Part of the reason there are two dialects of "C" (those where char is signed, and those where it is unsigned) is that there are some implementations where char must be unsigned, and others where it must be signed.
The same way -- e.g. if you have an 8-bit char, 7 bits can be used for magnitude and 1 for sign. So an unsigned char might range from 0 to 255, whilst a signed char might range from -128 to 127 (for example).
This because a char is stored at all effects as a 8-bit number. Speaking about a negative or positive char doesn't make sense if you consider it an ASCII code (which can be just signed*) but makes sense if you use that char to store a number, which could be in range 0-255 or in -128..127 according to the 2-complement representation.
*: it can be also unsigned, it actually depends on the implementation I think, in that case you will have access to extended ASCII charset provided by the encoding used
The same way how an int can be positive or negative. There is no difference. Actually on many platforms unqualified char is signed.

C: char to int conversion

From The C Programming Language (Brian W. Kernighan), 2.7 TYPE CONVERSIONS, pg 43 :
"There is one subtle point about the
conversion of characters to integers.
... On some macines a char whose
leftmost bit is 1 will be converted to
a negative integer. On others, ... is
always positive. For portability,
specify signed or unsigned if
non-character data is to be stored in
char variables."
My questions are:
Why would anyone want to store
non-char data in char? (an example
where this is necessary will be real
nice)
Why does integer value of char
change when it is converted to int?
Can you elaborate more on this
portability issue?
In regards to 1)
People often use char arrays when they really want a byte buffer for a data stream. Its not great practice, but plenty of projects do it, and if you're careful, no real harm is done. There are probably other times as well.
In regards to 2)
Signed integers are often sign extended when they are moved from a smaller data type. Thus
11111111b (-1 in base 10) becomes 11111111 11111111 11111111 11111111 when expanded to 32 bits. However, if the char was intended to be unsigned +255, then the signed integer may end up being -1.
About portability 3)
Some machines regard chars as signed integers, while others interpret them as unsigned. It could also vary based on compiler implementation. Most of the time you don't have to worry about it. Kernighan is just trying to help you understand the details.
Edit
I know this is a dead issue, but you can use the following code to check if char's on your system are signed or unsigned:
#include <limits.h> //Include implementation specific constants (MAX_INT, et c.)
#if CHAR_MAX == SCHAR_MAX
// Plain "char" is signed
#else
// Plain "char" is unsigned
#endif
1) char is the size of a single byte in C, and is therefore used for storing any sort of data. For example, when loading an image into memory, the data is represented as an array of char. In modern code, typedefs such as uint8_t are used to indicate the purpose of a buffer more usefully than just char.
2 & 3) Whether or not char is signed or unsigned is platform dependent, so if a program depends on this behavior then it's best to specify one or the other explicitly.
The char type is defined to hold one byte, i.e. sizeof(char) is defined to be 1. This is useful for serializing data, for instance.
char is implementation-defined as either unsigned char or signed char. Now imagine that char means smallint. You are simply converting a small integer to a larger integer when you go from smallint to int. The problem is, you don't know whether that smallint is signed or unsigned.
I would say it's not really a portability issue as long as you follow The Bible (K&R).
unsigned char is often used to process binary data one byte at a time. A common example is UTF-8 strings, which are not strictly made up of "chars."
If a signed char is 8 bits and the top bit is set, that indicates that it's negative. When this is converted to a larger type, the sign is kept by extending the high bit to the high bit of the new type. This is called a "sign-extended" assignment.
1) Char is implemented as one byte across all systems so it is consistent.
2) The bit mentioned in you question is the one that is used in single byte integers for their singed-ness. When a int on a system is larger than one byte the signed flat is not affected when you convert char to int, other wise it is. ( there are also singed and unsigned chars)
3) Because of the consistence of the char implementation lots of libs use them like the Intel IPP (Intel Performance Primitives) libs and their cousins OpenCV.
Usually, in C, char to int conversion and vice versa is an issue because the stanard APIs for reading character input/writing character output use int's for the character arguments and return values. See getchar(), getc() and putchar() for example.
Also, since the size of a char is 1 byte, it is a convenient way to deal with arbitrary data as a byte stream.

Resources