This question already has answers here:
What is an unsigned char?
(16 answers)
Closed 5 years ago.
I have seen in my legacy embedded code that people are using signed char as return type. What's the need to put signed there? Isn't that implicit that char is nothing but signed char.
char, signed char, and unsigned char are all distinct types.
Your implementation can set char to be either signed or unsigned.
unsigned char has a distinct range set by the standard. The problem with using simply char is that your code can behave differently on different platforms. It could even be a 1's complement or signed magnitude type with a range -127 to +127.
Because no-one in the candidate duplicate answer cited the right paragraph from the spec:
6.2.5 [Types], paragraph 15
The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char
So char could be either. (FWIW, I believe I ran into char = unsigned char in JNI.)
From CppReference:
Character types
signed char - type for signed character representation.
unsigned char - type for unsigned character representation. Also used to inspect object representations (raw memory).
char - type for character representation. Equivalent to either signed char or unsigned char (which one is implementation-defined and may be controlled by a compiler commandline switch), but char is a distinct type, different from both signed char and unsigned char.
So if you want to ensure that you're using either an unsigned char or a signed char, specify it explicitly. Otherwise there'd be no guarantee whether it'll be signed or not.
Related
#include <stdio.h>
int main(){
ssize_t a= -1;
size_t b = (unsigned)a;
return 0;
}
a is 8 bytes all set to 1, however b becomes a 4 byte number when casted to unsigned without doing a proper (unsigned size_t), why is that? why doesn't it turn into an 8 byte unsigned variable?
unsigned is short for writing unsigned int, so unsigned and unsigned int are the same type.
unsigned size_t does not exist; size_t is already unsigned.
why doesn't it turn into an 8 byte unsigned variable?
Because ints are commonly 32 bit or 4 bytes.
If you want fixed width integers you can use stdint.h which defines uint32_t, int32_t etc.
The C standard sheds some light on this:
In ยง 6.7.2p2, Type specifiers, "- unsigned, or unsigned int" are in the same list entry, because they are equivalent type specifiers.
As #marco-a told you that unsigned is shorthand for unsigned int and both are equivalent. The sizeof(unsigned int) could be 4 byte(commonly) or 8 byte, it's system-dependent. But size_t is an unsigned integral data type which is guaranteed to be big enough to contain the size of the biggest object the host system can handle. Basically the maximum permissible size is dependent on the compiler; if the compiler is 32 bit then it is simply a typedef(i.e., alias) for unsigned int but if the compiler is 64 bit then it would be a typedef for unsigned long long. The size_t data type is never negative.
Casting a negative value to unsigned (independently of what type is the unsigned) is Undefined Behaviour. You can cast only positive signed values to be unsigned.
The result of casting is not defined as a valid operation, and it depends on the architecture and the compiler implementation how it deals with that cast.
Anyway, when you add the unsigned keyword but don't specify the actual integer type you want, you have to think that short, long, long long, are also adjectives to the basic int type (which is the default type to substitute) so, to end, the default type is int.
This question already has answers here:
Whats wrong with this C code?
(4 answers)
Closed 6 years ago.
I was looking at the data types at the link data type
It is written as char type is 1 byte having a range -128 to 127 or 0 to 255.
How can this possible? By default char means signed right.
Edit: There is another question Whats wrong with this C code?. But it is not same question. Title says what is wrong with this code and search will not list this answer easily. One has to analyse the question fully to understand the issue.
Edit: After looking at several answers and comments, I got another doubt. Strings within double quotes are treated as char. I get warnings if I pass double quoted strings to a function having parameter of type signed char. Also itoa and many other library functions make use of char type parameter and not signed char. Ofcourse typecasting will avoid this problem. So what is the best parameter type for functions manipulating null terminated strings(for example LCD display related functions)? Use signed char or unsigned char (since char is implementation defined, it may not be portable I guess)
char has implementation-defined signedness. Meaning that one compiler can chose to implement it as signed and another as unsigned.
This is the reason why you should never use the char type for storing numbers. A better type to use for such is uint8_t.
char "has the same representation and alignment as either signed char or unsigned char, but is always a distinct type".
No, it doesn't mean signed char by default. According to the C standard, a char is a distinct type from both signed and unsigned chars, that merely behaves like one of the other two.
n1570/6.2.5p15
The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.
And in a note to the above paragraph:
CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.
I know that a char is allowed to be signed or unsigned depending on the implementation. This doesn't really bother me if all I want to do is manipulate bytes. (In fact, I don't think of the char datatype as a character, but a byte).
But, if I understand, string literals are signed chars (actually they're not, but see the update below), and the function fgetc() returns unsigned chars casted into int. So if I want to manipulate characters, is it preferred style to use signed, unsigned, or ambiguous characters? Why does reading characters from a file have a different convention than literals?
I ask because I have some code in c that does string comparison between string literals and the contents of files, but having a signed char * vs unsigned char * might really make my code error prone.
Update 1
Ok as a few people pointed out (in answers and comments) string literals are in fact char arrays, not signed char arrays. That means I really should use char * for string literals, and not think about whether they are signed or unsigned. This makes me perfectly happy (until I have to start making conversion/comparisons with unsigned chars).
However the important question remains, how do I read characters from a file, and compare them to a string literal. The crux of which is the conversion from the int read using fgetc(), which explicitly reads an unsigned char from the file, to the char type, which is allowed to be either signed or unsigned.
Allow me to provide a more detailed example.
int main(void)
{
FILE *someFile = fopen("ThePathToSomeRealFile.html", "r");
assert(someFile);
char substringFromFile[25];
memset((void*)substringFromFile,0,sizeof(substringFromFile));
//Alright, the real example is to read the first few characters from the file
//And then compare them to the string I expect
const char *expectedString = "<!DOCTYPE";
for( int counter = 0; counter < sizeof(expectedString)/sizeof(*expectedString); ++counter )
{
//Read it as an integer, because the function returns an `int`
const int oneCharacter = fgetc(someFile);
if( ferror(someFile) )
return EXIT_FAILURE;
if( int == EOF || feof(someFile) )
break;
assert(counter < sizeof(substringFromFile)/sizeof(*substringFromFile));
//HERE IS THE PROBLEM:
//I know the data contained in oneCharacter must be an unsigned char
//Therefore, this is valid
const unsigned char uChar = (const unsigned char)oneCharacter;
//But then how do I assign it to the char?
substringFromFile[counter] = (char)oneCharacter;
}
//and ultimately here's my goal
int headerIsCorrect = strncmp(substringFromFile, expectedString, 9);
if(headerIsCorrect != 0)
return EXIT_SUCCESS;
//else
return EXIT_FAILURE;
}
Essentially, I know my fgetc() function is returning something that (after some error checking) is code-able as an unsigned char. I know that char may or may not be an unsigned char. That means, depending on the implementation of the c standard, doing a cast to char will involve no reinterpretation. However, in the case that the system is implemented with a signed char, I have to worry about values that can be coded by an unsigned char that aren't code-able by char (i.e. those values between (INT8_MAX UINT8_MAX]).
tl;dr
The question is this, should I (1) copy their underlying data read by fgetc() (by casting pointers - don't worry, I know how to do that), or (2) cast down from unsigned char to char (which is only safe if I know that the values can't exceed INT8_MAX, or those values can be ignored for whatever reason)?
The historical reasons are (as I've been told, I don't have a reference) that the char type was poorly specified from the beginning.
Some implementations used "consistent integer types" where char, short, int and so on were all signed by default. This makes sense because it makes the types consistent with each other.
Other implementations used unsigned for character, since there never existed any symbol tables with negative indices (that would be stupid) and since they saw a need for more than 128 characters (a very valid concern).
By the time C got standardized properly, it was too late to change this, too many different compilers and programs written for them were already out on the market. So the signedness of char was made implementation-defined, for backwards compatibility reasons.
The signedness of char does not matter if you only use it to store characters/strings. It only matters when you decide to involve the char type in arithmetic expressions or use it to store integer values - this is a very bad idea.
For characters/string, always use char (or wchar_t).
For any other form of 1 byte large data, always use uint8_t or int8_t.
But, if I understand, string literals are signed char
No, string literals are char arrays.
the function fgetc() returns unsigned chars casted into int
No, it returns a char converted to an int. It is int because the return type may contain EOF, which is an integer constant and not a character constant.
having a signed char * vs unsigned char * might really make my code error prone.
No, not really. Formally, this rule from the standard applies:
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
There exists no case where casting from pointer to signed char to pointer to unsigned char or vice versa, would cause any alignment issues or other issues.
I know that a char is allowed to be signed or unsigned depending on the implementation. This doesn't really bother me if all I want to do is manipulate bytes.
If you're going to do comparison or assign char to other integer types, it should bother you.
But, if I understand, string literals are signed chars
They are of type char[], so if char === unsigned char, all string literals are unsigned char[].
the function fgetc() returns unsigned chars casted into int.
That's correct and is required to omit undesired sign extension.
So if I want to manipulate characters, is it preferred style to use signed, unsigned, or ambiguous characters?
For portability I'd advise to follow practice adapted by various libc implementations: use char, but before processing cast to unsigned char (char* to unsigned char*). This way implicit integer promotions won't turn characters in the range 0x80 -- 0xff into negative numbers of wider types.
In short: (signed char)a < (signed char)b is NOT always equivalent to (unsigned char)a < (unsigned char)b. Here is an example.
Why does reading characters from a file have a different convention than literals?
getc() needs a way to return EOF such that it couldn't be confused with any real char.
(Edited change C/C++ to C)
Please help me to find out a clean clarification on char and unsigned char in C. Specially when we transfer data between embedded devices and general PCs (The difference between buffer of unsigned char and plain char).
You're asking about two different languages but, in this respect, the answer is (more or less) the same for both. You really should decide which language you're using though.
Differences:
they are distinct types
it's implementation-defined whether char is signed or unsigned
Similarities:
they are both integer types
they are the same size (one byte, at least 8 bits)
If you're simply using them to transfer raw byte values, with no arithmetic, then there's no practical difference.
The type char is special. It is not an unsigned char or a signed char. These are three distinct types (while int and signed int are the same types). A char might have a signed or unsigned representation.
From 3.9.1 Fundamental types
Plain char, signed char, and unsigned char are three distinct types. A
char, a signed char, and an unsigned char occupy the same amount of
storage and have the same alignment requirements (3.11); that is, they
have the same object representation.
I use XLookupString that map a key event to ASCII string, keysym, and ComposeStatus.
int XLookupString(event_structure, buffer_return, bytes_buffer, keysym_return, status_in_out)
XKeyEvent *event_structure;
char *buffer_return; /* Returns the resulting string (not NULL-terminated). Returned value of the function is the length of the string. */
int bytes_buffer;
KeySym *keysym_return;
XComposeStatus *status_in_out;
Here is my code:
char mykey_string;
int arg = 0;
------------------------------------------------------------
case KeyPress:
XLookupString( &event.xkey, &mykey_string, 1, 0, 0 );
arg |= mykey_string;
But using 'char' variables in bit operations, sign extension can generate unexpected results.
I is possible to prevent this?
Thanks
char can be either signed or unsigned so if you need unsigned char you should specify it explicitly, it makes it clear to those reading you code your intention as opposed to relying on compiler settings.
The relevant portion of the c99 draft standard is from 6.2.5 Types paragraph 15:
The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char