I have a buffer structure with a field
char inc_factor;
which is the amount of bytes to increment in a character array. The problem is that it must be able to hold a value up to 255. Obviously the easiest solution is to change it to unsigned char, but I'm not able to change the supplied structure definition. The function:
Buffer * b_create(short init_capacity, char inc_factor, char o_mode)
Takes in those parameters and return a pointer to a buffer. I was wondering how I would be able to fit the number 255 in a signed char.
You can convert the type:
unsigned char n = inc_factor;
Signed-to-unsigned conversion is well defined and does what you want, since all three char types are required to have the same width.
You may need to be careful on the calling end (or when you store the char in your structure) and do something like f(n - UCHAR_MAX) or so (since again, if this is negative and char is unsigned, all is well).
Lets use the term "byte" to represents 8-bits of storage in memory.
A byte with the value of "0xff" can be accessed either as an unsigned character or as a signed character.
BYTE byte = 0xff;
unsigned char* uc = (unsigned char*)&byte;
signed char* sc = (signed char*)&byte; // same as "char", the "signed" is a default.
printf("uc = %u, sc = %d\n", *uc, *sc);
(I chose to use pointers because I want to demonstrate that the underlying value stored in memory is the same).
Will output
uc = 255, sc = -1
"signed" numbers use the same storage space (number of bits) as unsigned, but they use the upper-most bit as a flag to tell the cpu whether to treat them as negative or not.
The bit pattern that represents "255" (11111111) unsigned is the same bit pattern that represents -1. The bit pattern "10000000" is either 128 or -127.
So you can store the number "255" in a signed char, by storing "-1" and then casting it to an unsigned int.
EDIT:
In-case you're wondering: negative numbers start "at the top" (i.e. 0xff/255) for computational convenience. Remember that the underlying storage is a byte, so if you take "0xff" and add 1, just using normal, unsigned cpu math, it produces the value "0x00". Which is the correct value for "i + 1" when "i = -1". Of course, it would be equally odd if negative numbers started with "-1" having the value 0x80/128.
You COULD cast it.
(unsigned char)inc_factor = 250;
And then you could read it back also with a cast :
if( (unsigned char)inc_factor == 250 ) {...}
However, that's really not best practices, It'll confuse anyone who has to maintain the code.
In addition, it's not going to help you if you're passing inc_factor into a function that expects a signed char.
There's no way to read that value as a signed char and get a value above 128.
Related
I have an unsigned int that actually stores a signed int, and the signed int ranges from -128 to 127.
I would like to store this value back in the unsigned int so that I can simply
apply a mask 0xFF and get the signed char.
How do I do the conversion ?
i.e.
unsigned int foo = -100;
foo = (char)foo;
char bar = foo & 0xFF;
assert(bar == -100);
The & 0xFF operation will produce a value in the range 0 to 255. It's not possible to get a negative number this way. So, even if you use & 0xFF somewhere, you will still need to apply a conversion later to get to the range -128 to 127.
In your code:
char bar = foo & 0xFF;
there is an implicit conversion to char. This relies on implementation-defined behaviour but this will work on all but the most esoteric of systems. The most common implementation definition is the inverse of the conversion that applies when converting unsigned char to char.
(Your previous line foo = (char)foo; should be removed).
However,
char bar = foo;
would produce exactly the same effect (again, except for on those esoteric systems).
Since the unsigned int foo value does not reach the boundaries of -128 or 127 the implicit conversion will work for this case. But if unsigned int foo had a bigger value you will be losing bytes at the moment when storing it in a char variable and will get unexpected results on your program.
Answering for C,
If you have an unsigned int whose value was set by assignment of a value of type char (where char happens to be a signed type) or of type signed char, where the assigned value was negative, then the stored value is the arithmetic sum of the assigned negative value and one more than UINT_MAX. This will be far beyond the range of values representable by (signed) char on any C system I have ever encountered. If you convert that value back to (signed) char, whether implicitly or via a cast, "either the result is implementation-defined, or an implementation-defined signal is raised" (C2011, 6.3.1.3/3).
Converting back to the original char value in a way that avoids implementation-defined behavior is a bit tricky (but relying on implementation-defined behavior may afford much easier approaches). Certainly, masking off all but the 8 lowest-order value bits does not do the trick, as it always gives you a positive value. Also, it assumes that char is 8 bits wide, which, in principle, is not guaranteed. It does not necessarily even give you the correct bit pattern, as C permits negative integers to be represented in any of three different ways.
Here's an approach that will work on any conforming C system:
unsigned int foo = SOME_SIGNED_CHAR_VALUE;
signed char bar;
/* ... */
if (foo <= SCHAR_MAX) {
/* foo's value is representable as a signed char */
bar = foo;
} else {
/* mask off the highest-order value bits to yield a value that fits in an int */
int foo2 = foo & INT_MAX;
/* reverse the conversion to unsigned int, as if unsigned int had the same
number of value bits as int; the other bits are already accounted for */
bar = (foo2 - INT_MAX) - 1;
}
That relies only on characteristics of integer representation and conversion that C itself defines.
Don't do it.
Casting to a smaller size may truncate the value. Casting from signed to unsigned or opposite may results wrong value (e.g. 255 -> -1).
If you have to make calculations with different data types, pick one common type, prefereably signed and long int (32-bit), and check boundaries before casting down (to smaller size).
Signed helps you detect underflows (e.g. when result gets less than 0), long int (or just simply: int, which means natural word length) suits for machines (32-bit or 64-bit), and it's big enough for most purposes.
Also try to avoid mixed types in formulas, especially when they contain division (/).
Trying to pre-pend a 2 byte message length, after getting the length in a 4 byte int. I use memcpy to copy 2 bytes of the int. When I look at the second byte I copied, it is as expected, but accessing the first byte actually prints 4 bytes.
I would expect that dest[0] and dest[1] both contain 1 byte of the int. whether or not it's a significant byte, or the order is switched... I can throw in an offset on the memcpy or reversing 0 and 1. It does not have to be portable, I would just like it to work.
The same error is happening in Windows with LoadRunner and Ubuntu with GCC - so I have at least tried to rule out portability as a cause.
I'm not sure where I'm going wrong. I am suspecting it's related to my lack of using pointers recently? Is there a better approach to cast an int to a short and then put it in the first 2 bytes of a buffer?
char* src;
char* dest;
int len = 2753; // Hex - AC1
src=(char*)malloc(len);
dest=(char*)malloc(len+2);
memcpy(dest, &len, 2);
memcpy(dest+2, src, len);
printf("dest[0]: %02x", dest[0]);
// expected result: c1
// actual result: ffffffc1
printf("dest[1]: %02x", dest[1]);
// expected result: 0a
// actual result: 0a
You cannot just take a random two bytes out of a four byte object and call it a cast to short.
You will need to copy your int into a two byte int before doing your memcpy.
But actually, that isn't the best way to do it either, because you have no control over the byte order of an integer.
Your code should look like this:
dest[0] = ((unsigned)len >> 8) & 0xFF;
dest[1] = ((unsigned)len) & 0xFF;
That should write it out in network byte order aka big endian. All of the standard network protocols use this byte order.
And I'd add something like:
assert( ((unsigned)len & 0xFFFF0000) == 0 ); // should be nothing in the high bytes
Firstly, you are using printf incorrectly. This
printf("dest[0]: %02x", dest[0]);
uses x format specifier in printf. x format specifier requires an argument of type unsigned int. Not char, but unsigned int and only unsigned int (or alternatively an int with non-negative value).
The immediate argument you supplied has type char, which is probably signed on your platform. This means that your dest[0] contains -63. A variadic argument of type char is automatically promoted to type int, which turns 0xc1 into 0xffffffc1 (as a signed representation of -63 in type int). Since printf expects an unsigned int value and you are passing a negative int value instead, the behavior is undefined. The printout that you see is nothing more than a manifestation of that undefined behavior. It is meaningless.
One proper way to print dest[0] in this case would be
printf("dest[0]: %02x", (unsigned) dest[0]);
I'm pretty sure the output will still be ffffffc1, but in this case 0xffffffc1 is the prefectly expected result of integer conversion from negative -63 value to unsigned int type. Nothing unusual here.
Alternatively you can do
printf("dest[0]: %02x", (unsigned char) dest[0]);
which should give you your desired c1 output. Note that the conversion to int takes place in this case as well, but since the original value is positive (193), the result of the conversion to int is positive too and printf works properly.
Finally, if you want to work with raw memory directly, the proper type to use would be unsigned char from the very beginning. Not char, but unsigned char.
Secondly, an object of type int may easily occupy more than two 8-bit bytes. Depending on the platform, the 0xA and 0xC1 values might end up in completely different portions of the memory region occupied by that int object. You should not expect that copying the first two bytes of an int object will copy the 0xAC1 portion specifically.
You make the assumption that an "int" is two bytes. What justification do you have for that? Your code is highly unportable.
You make another assumption that "char" is unsigned. What justification do you have for that? Again, your code is highly unportable.
You make another assumption about the ordering of bytes in an int. What justification do you have for that? Again, your code is highly unportable.
instead of the literal 2, use sizeof(int). Never hard code the size of a type.
If this code should be portable, you should not use int, but a fixed size datatype.
If you need 16 bit, you could use int16_t.
Also, the printing of the chars would need a cast to unsigned. Now, the char is upcasted to an int, and the sign is extended. This gives the initial FFFF's
So, where can unsigned char be useful?
If I understood right, unsigned char can represent numbers from -128 to 127. But every encoding table uses positive numbers. So, unsigned char can't be used for representing characters. Am I right?
No, unsigned char is 0 to 255.
It can be useful in representing binary data (a single byte), although, like any primitive data type, the possibilities are endless.
First of all, what you are representing is signed char, unsigned char ranges from 0 - 255.
To answer your questions about negative valued character, you are right that character encoding is done using positive values.
On a different view, just think of signed and unsigned char as integer representation.
Unsigned char is used to represent bytes. If you need just one byte of memory in a variable, you use unsigned char and assign an integer to it.
fo example, there is used uint8_t to represent bytes, but is not more than that.
A signed char can represent number from -128 to +127
and unsigned char is from 0 to 255.
Altough unsigned is more convenient in many use cases,
everthing binary-related can be done with signed too:
0=0, 1=1 ... 127=127, -128=128, -127=129, -126=130 ... -1=255
Such conversions happens automatically (or, better to say,
it´s just different interpretation).
("binary-related" means that a mathematical -2 * 2 would be possible too with unsigned,
but make even less sense)
Regarding So, where can unsigned char be useful?
Here perhaps?: (a very simple example to test for ASCII digit)
BOOL isDigit(unsigned char c)
{
if((c >= '0') &&(c <= '9')) return TRUE;
return FALSE;
}
By virtue of argument type unsigned char guarantees input will be a single ASCII character (there are 128 encoded ASCII possibilities, with Extended ASCII, there are 255 possibilities). So, in this function, all that remains is to test input value for specific criteria (in this case is it a digit) There is no requirement for function to test for negative numbers. A regular char (i.e. signed) cannot contain the entire range of ASCII characters. The sizeof unsigned char is also significant in that it is only 1 byte as opposed to 4 bytes (typically, but not always) for say, an int
So my code has in it the following:
unsigned short num=0;
num=*(cra+3);
printf("> char %u\n",num);
cra is a char*
The problem is that it is getting odd output, sometimes outputting numbers such as 65501 (clearly not within the range of a char). Any ideas?
Thanks in advance!
Apparently *(cra+3) is a char of value '\xdd'. Since a char is signed, it actually means -35 (0xdd in 2's complement), i.e. 0x...fffffdd. Restricting this to 16-bit gives 0xffdd, i.e. 65501.
You need to make it an unsigned char so it gives a number in the range 0–255:
num = (unsigned char)cra[3];
Note:
1. the signedness of char is implementation defined, but usually (e.g. in OP's case) it is signed.
2. the ranges of signed char, unsigned char and unsigned short are implementation defined, but again commonly they are -128–127, 0–255 and 0–65535 respectively.
3. the conversion from signed char to unsigned char is actually -35 + 65536 = 65501.
char is allowed to be either signed or unsigned - apparently, on your platform, it is signed.
This means that it can hold values like -35. Such a value not within the range representable by unsigned short. When a number out of range is converted to an unsigned type, it is brought into range by repeatedly adding or subtracting one more than the maximum value representable in that type.
In this case, your unsigned short can represent values up to 65535, so -35 is brought into range by adding 65536, which gives 65501.
unsigned short has a range of (at least) 0 .. 65535 (link), the %u format specifier prints an unsigned int with a range of (commonly) 0 .. 4294967295. Thus, depending on the value of cra, the output appears to be completely sensible.
cra is just a pointer.
It hasn't been allocated any space, by way of malloc or calloc. So its contents are undefined . *(cra + 3) will evaluate to the contents of the location 3 bytes ahead of the location cra (assuming char occupies 1 byte). I believe that its contents are also undefined.
unsigned short takes up 2 bytes, atleast on my system. Hence it can hold values from 0 to 65536. So, your output is within its defined range
Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.
It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.
In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.
In terms of the values they represent:
unsigned char:
spans the value range 0..255 (00000000..11111111)
values overflow around low edge as:
0 - 1 = 255 (00000000 - 00000001 = 11111111)
values overflow around high edge as:
255 + 1 = 0 (11111111 + 00000001 = 00000000)
bitwise right shift operator (>>) does a logical shift:
10000000 >> 1 = 01000000 (128 / 2 = 64)
signed char:
spans the value range -128..127 (10000000..01111111)
values overflow around low edge as:
-128 - 1 = 127 (10000000 - 00000001 = 01111111)
values overflow around high edge as:
127 + 1 = -128 (01111111 + 00000001 = 10000000)
bitwise right shift operator (>>) does an arithmetic shift:
10000000 >> 1 = 11000000 (-128 / 2 = -64)
I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).
Update
Some implementation-specific behaviour mentioned in the comments:
char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.
#include <stdio.h>
int main(int argc, char** argv)
{
char a = 'A';
char b = 0xFF;
signed char sa = 'A';
signed char sb = 0xFF;
unsigned char ua = 'A';
unsigned char ub = 0xFF;
printf("a > b: %s\n", a > b ? "true" : "false");
printf("sa > sb: %s\n", sa > sb ? "true" : "false");
printf("ua > ub: %s\n", ua > ub ? "true" : "false");
return 0;
}
[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false
It's important when sorting strings.
There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.
The sign also makes a difference when passing to vararg functions:
char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);
Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.
Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.
Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.
"What does it mean for a char to be signed?"
Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)
When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.
A "signed character" happens to be perfect for this representation.
Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.
Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:
converting to a larger int
comparison functions
The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.
Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).
As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.
I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).
The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.
char a = (char)42;
char b = (char)120;
char c = a + b;
Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.
One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.