So, where can unsigned char be useful?
If I understood right, unsigned char can represent numbers from -128 to 127. But every encoding table uses positive numbers. So, unsigned char can't be used for representing characters. Am I right?
No, unsigned char is 0 to 255.
It can be useful in representing binary data (a single byte), although, like any primitive data type, the possibilities are endless.
First of all, what you are representing is signed char, unsigned char ranges from 0 - 255.
To answer your questions about negative valued character, you are right that character encoding is done using positive values.
On a different view, just think of signed and unsigned char as integer representation.
Unsigned char is used to represent bytes. If you need just one byte of memory in a variable, you use unsigned char and assign an integer to it.
fo example, there is used uint8_t to represent bytes, but is not more than that.
A signed char can represent number from -128 to +127
and unsigned char is from 0 to 255.
Altough unsigned is more convenient in many use cases,
everthing binary-related can be done with signed too:
0=0, 1=1 ... 127=127, -128=128, -127=129, -126=130 ... -1=255
Such conversions happens automatically (or, better to say,
it´s just different interpretation).
("binary-related" means that a mathematical -2 * 2 would be possible too with unsigned,
but make even less sense)
Regarding So, where can unsigned char be useful?
Here perhaps?: (a very simple example to test for ASCII digit)
BOOL isDigit(unsigned char c)
{
if((c >= '0') &&(c <= '9')) return TRUE;
return FALSE;
}
By virtue of argument type unsigned char guarantees input will be a single ASCII character (there are 128 encoded ASCII possibilities, with Extended ASCII, there are 255 possibilities). So, in this function, all that remains is to test input value for specific criteria (in this case is it a digit) There is no requirement for function to test for negative numbers. A regular char (i.e. signed) cannot contain the entire range of ASCII characters. The sizeof unsigned char is also significant in that it is only 1 byte as opposed to 4 bytes (typically, but not always) for say, an int
Related
I have been reading "The C Programming Language" book by "KnR", and i've come across this statement:
"plain chars are signed or unsigned"
So my question is, what is a plain char and how is it any different from
signed char and unsigned char?
In the below code how is 'myPlainChar' - 'A' different from
'mySignChar' - 'A' and 'myUnsignChar' - 'A'?
Can someone please explain me the statement "Printable char's are
always positive".
Note: Please write examples and explain. Thank you.
{
char myChar = 'A';
signed char mySignChar = 'A';
unsigned char myUnsignChar = 'A';
}
There are signed char and unsigned char. Whether char is signed or unsigned by default depends on compiler and its settings. Usually it is signed.
There is only one char type, just like there is only one int type.
But like with int you can add a modifier to tell the compiler if it's an unsigned or a signed char (or int):
signed char x1; // x1 can hold values from -128 to +127 (typically)
unsigned char x2; // x2 can hold values from 0 to +255 (typically)
signed int y1; // y1 can hold values from -2147483648 to +2147483647 (typically)
unsigned int y2; // y2 can hold values from 0 to +4294967295 (typically)
The big difference between plain unmodified char and int is that int without a modifier will always be signed, but it's implementation defined (i.e. it's up to the compiler) if char without a modifier is signed or unsigned:
char x3; // Could be signed, could be unsigned
int y3; // Will always be signed
Plain char is the type spelled char without signed or unsigned prefix.
Plain char, signed char and unsigned char are three distinct integral types (yes, character values are (small) integers), even though plain char is represented identically to one of the other two. Which one is implementation defined. This is distinct from say int : plain int is always the same as signed int.
There's a subtle point here: if plain char is for example signed, then it is a signed type, and we say "plain char is signed on this system", but it's still not the same type as signed char.
The difference between these two lines
signed char mySignChar = 'A';
unsigned char myUnsignChar = 'A';
is exactly the same as the difference between these two lines:
signed int mySignInt = 42;
unsigned int myUnsignInt = 42;
The statement "Printable char's are always positive" means exactly what it says. On some systems some plain char values are negative. On all systems some signed char values are negative. On all systems there is a character of each kind that is exactly zero. But none of those are printable. Unfortunately the statement is not necessarily correct (it is correct about all characters in the basic execution character set, but not about the extended execution character set).
How many char types are there in C?
There is one char type. There are 3 small character types: char, signed char, unsigned char. They are collectively called character types in C.
char has the same range/size/ranking/encoding as signed char or unsigned char, yet is a distinct type.
what is a plain char and how is it any different from signed char and unsigned char?
They are 3 different types in C. A plain char char will match the same range/size/ranking/encoding as either singed char or unsigned char. In all cases the size is 1.
2 .how is myPlainChar - 'A' different from mySignChar - 'A' and myUnsignChar - 'A'?
myPlainChar - 'A' will match one of the other two.
Typically mySignChar has a value in the range [-128...127] and myUnsignChar in the range of [0...255]. So a subtraction of 'A' (typically a value of 65) will result a different range of potential answers.
Can someone please explain me the statement "Printable char's are always positive".
Portable C source code characters (the basic
execution character set) are positive so printing a source code file only prints characters of non-negative values.
When printing data with printf("%c", some_character_type) or putc(some_character_type) the value, either positive or negative is converted to an unsigned char before printing. Thus it is a character associated with a non-negative value that is printed.
C has isprint(int c) which "tests for any printing character including space". That function is only valid for values in the unsigned char range and the negative EOF. isprint(EOF) reports 0. So only non-negative values pass the isprint(int c) test.
C really has no way to print negative values as characters without undergoing a conversion to unsigned char.
I think it means char without 'unsigned' in front of it ie:
unsigned char a;
as opposed to
char a; // signed char
So basically a variable is always signed (for integers and char) unless you use the statement 'unsigned'.
That should answer the second question as well.
The third question: Characters that are in the ascii set are defined as unsigned characters, ie the number -60 doesn't represent a character, but 65 does, ie 'A'.
I just started learning C and am rather confused over declaring characters using int and char.
I am well aware that any characters are made up of integers in the sense that the "integers" of characters are the characters' respective ASCII decimals.
That said, I learned that it's perfectly possible to declare a character using int without using the ASCII decimals. Eg. declaring variable test as a character 'X' can be written as:
char test = 'X';
and
int test = 'X';
And for both declaration of character, the conversion characters are %c (even though test is defined as int).
Therefore, my question is/are the difference(s) between declaring character variables using char and int and when to use int to declare a character variable?
The difference is the size in byte of the variable, and from there the different values the variable can hold.
A char is required to accept all values between 0 and 127 (included). So in common environments it occupies exactly
one byte (8 bits). It is unspecified by the standard whether it is signed (-128 - 127) or unsigned (0 - 255).
An int is required to be at least a 16 bits signed word, and to accept all values between -32767 and 32767. That means that an int can accept all values from a char, be the latter signed or unsigned.
If you want to store only characters in a variable, you should declare it as char. Using an int would just waste memory, and could mislead a future reader. One common exception to that rule is when you want to process a wider value for special conditions. For example the function fgetc from the standard library is declared as returning int:
int fgetc(FILE *fd);
because the special value EOF (for End Of File) is defined as the int value -1 (all bits to one in a 2-complement system) that means more than the size of a char. That way no char (only 8 bits on a common system) can be equal to the EOF constant. If the function was declared to return a simple char, nothing could distinguish the EOF value from the (valid) char 0xFF.
That's the reason why the following code is bad and should never be used:
char c; // a terrible memory saving...
...
while ((c = fgetc(stdin)) != EOF) { // NEVER WRITE THAT!!!
...
}
Inside the loop, a char would be enough, but for the test not to succeed when reading character 0xFF, the variable needs to be an int.
The char type has multiple roles.
The first is that it is simply part of the chain of integer types, char, short, int, long, etc., so it's just another container for numbers.
The second is that its underlying storage is the smallest unit, and all other objects have a size that is a multiple of the size of char (sizeof returns a number that is in units of char, so sizeof char == 1).
The third is that it plays the role of a character in a string, certainly historically. When seen like this, the value of a char maps to a specified character, for instance via the ASCII encoding, but it can also be used with multi-byte encodings (one or more chars together map to one character).
Size of an int is 4 bytes on most architectures, while the size of a char is 1 byte.
Usually you should declare characters as char and use int for integers being capable of holding bigger values. On most systems a char occupies a byte which is 8 bits. Depending on your system this char might be signed or unsigned by default, as such it will be able to hold values between 0-255 or -128-127.
An int might be 32 bits long, but if you really want exactly 32 bits for your integer you should declare it as int32_t or uint32_t instead.
I think there's no difference, but you're allocating extra memory you're not going to use. You can also do const long a = 1;, but it will be more suitable to use const char a = 1; instead.
This question already has answers here:
What is an unsigned char?
(16 answers)
char!=(signed char), char!=(unsigned char)
(4 answers)
Closed 5 years ago.
So I know that the difference between a signed int and unsigned int is that a bit is used to signify if the number if positive or negative, but how does this apply to a char? How can a character be positive or negative?
There's no dedicated "character type" in C language. char is an integer type, same (in that regard) as int, short and other integer types. char just happens to be the smallest integer type. So, just like any other integer type, it can be signed or unsigned.
It is true that (as the name suggests) char is mostly intended to be used to represent characters. But characters in C are represented by their integer "codes", so there's nothing unusual in the fact that an integer type char is used to serve that purpose.
The only general difference between char and other integer types is that plain char is not synonymous with signed char, while with other integer types the signed modifier is optional/implied.
I slightly disagree with the above. The unsigned char simply means: Use the most significant bit instead of treating it as a bit flag for +/- sign when performing arithmetic operations.
It makes significance if you use char as a number for instance:
typedef char BYTE1;
typedef unsigned char BYTE2;
BYTE1 a;
BYTE2 b;
For variable a, only 7 bits are available and its range is (-127 to 127) = (+/-)2^7 -1.
For variable b all 8 bits are available and the range is 0 to 255 (2^8 -1).
If you use char as character, "unsigned" is completely ignored by the compiler just as comments are removed from your program.
There are three char types: (plain) char, signed char and unsigned char. Any char is usually an 8-bit integer* and in that sense, a signed and unsigned char have a useful meaning (generally equivalent to uint8_t and int8_t). When used as a character in the sense of text, use a char (also referred to as a plain char). This is typically a signed char but can be implemented either way by the compiler.
* Technically, a char can be any size as long as sizeof(char) is 1, but it is usually an 8-bit integer.
Representation is the same, the meaning is different. e.g, 0xFF, it both represented as "FF". When it is treated as "char", it is negative number -1; but it is 255 as unsigned. When it comes to bit shifting, it is a big difference since the sign bit is not shifted. e.g, if you shift 255 right 1 bit, it will get 127; shifting "-1" right will be no effect.
A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short. An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short. A type char without a signed or unsigned qualifier may behave as either a signed or unsigned char; this is usually implementation-defined, but there are a couple of cases where it is not:
If, in the target platform's character set, any of the characters required by standard C would map to a code higher than the maximum `signed char`, then `char` must be unsigned.
If `char` and `short` are the same size, then `char` must be signed.
Part of the reason there are two dialects of "C" (those where char is signed, and those where it is unsigned) is that there are some implementations where char must be unsigned, and others where it must be signed.
The same way -- e.g. if you have an 8-bit char, 7 bits can be used for magnitude and 1 for sign. So an unsigned char might range from 0 to 255, whilst a signed char might range from -128 to 127 (for example).
This because a char is stored at all effects as a 8-bit number. Speaking about a negative or positive char doesn't make sense if you consider it an ASCII code (which can be just signed*) but makes sense if you use that char to store a number, which could be in range 0-255 or in -128..127 according to the 2-complement representation.
*: it can be also unsigned, it actually depends on the implementation I think, in that case you will have access to extended ASCII charset provided by the encoding used
The same way how an int can be positive or negative. There is no difference. Actually on many platforms unqualified char is signed.
So my code has in it the following:
unsigned short num=0;
num=*(cra+3);
printf("> char %u\n",num);
cra is a char*
The problem is that it is getting odd output, sometimes outputting numbers such as 65501 (clearly not within the range of a char). Any ideas?
Thanks in advance!
Apparently *(cra+3) is a char of value '\xdd'. Since a char is signed, it actually means -35 (0xdd in 2's complement), i.e. 0x...fffffdd. Restricting this to 16-bit gives 0xffdd, i.e. 65501.
You need to make it an unsigned char so it gives a number in the range 0–255:
num = (unsigned char)cra[3];
Note:
1. the signedness of char is implementation defined, but usually (e.g. in OP's case) it is signed.
2. the ranges of signed char, unsigned char and unsigned short are implementation defined, but again commonly they are -128–127, 0–255 and 0–65535 respectively.
3. the conversion from signed char to unsigned char is actually -35 + 65536 = 65501.
char is allowed to be either signed or unsigned - apparently, on your platform, it is signed.
This means that it can hold values like -35. Such a value not within the range representable by unsigned short. When a number out of range is converted to an unsigned type, it is brought into range by repeatedly adding or subtracting one more than the maximum value representable in that type.
In this case, your unsigned short can represent values up to 65535, so -35 is brought into range by adding 65536, which gives 65501.
unsigned short has a range of (at least) 0 .. 65535 (link), the %u format specifier prints an unsigned int with a range of (commonly) 0 .. 4294967295. Thus, depending on the value of cra, the output appears to be completely sensible.
cra is just a pointer.
It hasn't been allocated any space, by way of malloc or calloc. So its contents are undefined . *(cra + 3) will evaluate to the contents of the location 3 bytes ahead of the location cra (assuming char occupies 1 byte). I believe that its contents are also undefined.
unsigned short takes up 2 bytes, atleast on my system. Hence it can hold values from 0 to 65536. So, your output is within its defined range
hi i am interested in those chars which are representable by ascii table. for that reason i am doing the following:
int t(char c) { return (int) c; }
...
if(!(t(d)>255)) { dostuff(); }
so i am interested in only ascii table representable chars, which i assume after conversion to int should be less than 256, am i right? thanks!
Usually (not always) a char is 8-bits so all chars would typically have a value of less than 256. So your test would always succeed.
Also, ASCII only goes up to 127, not 255. The characters after that are not standard ASCII, and can vary depending on code pages.
If you are dealing with international characters you should probably use wide characters instead of char.
Use the library:
#include <ctype.h>
...
if (isascii(d)) { dostuff(); }
Two caveats:
The C standard does not decide if char is by default signed or unsigned. If your compiler treated char as signed by default the cast to int could result in negative values instead of the values from 128 to 255 (and this is assuming that your chars are 8-bit, too). Perhaps it's better to use unsigned char if you want to be sure this range will be converted the way you expect.
Technically ASCII is from 0 to 127, everything above is some kind of extension.
char is an integral type in C. You can do the check directly:
char c;
/* assign to c */
if (c >= 0 && c <= 127) {
/* in ASCII range */
}
I am assuming you don't want to use isascii() (it's not in the C standard, although it is POSIX).
Also, you can check if CHAR_MAX is equal to 127. If it is, you don't need the comparison with 127, since c will not exceed it by definition. Similarly, if CHAR_MIN is 0, then you don't need the comparison with 0. Both CHAR_MIN and CHAR_MAX are defined in limits.h.
I think you're thinking about an integer value overflowing a char, and therefore convert it to an int. But, that doesn't help with overflow since the damage has already been done.
Size of char is always 1 byte (as per standard). For all practical matters this means that a char var cannot have a value bigger than 255. (though there are systems, where a byte has more than 8 bits and thus a char value can be bigger, but these are rare nowadays)
Additional caveat is that if char is not defined as signed or unsigned, so it can be in the -128 to 127 range or the 0 to 255 range. (assuming 8 bits per byte, of course :-))
Meanwhile, the ASCII table is 7-bit, which means it covers the range of 0 to 127. So if you are interested in only ASCII symbols, you can just check if the value of your char var is in that range. No need to cast for the comparison.