Converting int to char C formula - c

int num = 65537;
char p = (char)num; //char = 1;
Whats going on here?
Is it p=num%(127+128)-1
or p=num%256 or something else?
I need to know why p is equal to 1.
thanks!

since 65537 is 00000000 00000001 00000000 00000001 in binary, but the char type has only 1 byte, the last byte is considered for the char value, wich is 00000001 = 1

Short answer: In practice on standard processors, it is 1 because 65537 % 256 == 1. The reason is the one ksmonkey123 explained.
Note: If you were writing 127 + 128 because the bounds of a signed char, which is equivalent to char on typical compilers nowadays, are -128 to +127, please remember that the number of values between -128 and +127 is (127 - (-128) + 1), which also yields 256, so it does not matter whether you use the bounds of signed char (-128 to 127) or unsigned char (0 to 255).
Nitpick: Actually, assigning a value that can not be represented in an signed destination variable, you get undefined behaviour, and according to the C standard, all bets are off.
Assigning a positive value that does not fit into an unsigned variable yields "mod range" behaviour, like "%256" for unsigned chars if char has 8 bits. Assigning a negative value into an unsigned variable results in one of three possible behaviours defined by the standard. The implementation has to describe which behaviour is used by that implementation. All non-embedded C compilers nowadays behave like adding a multiple of 2^N, where N is the number of bits of the target type, to the value. So "-510" gets to +2 by adding 2*256 to it, and then this +2 is stored in the variable.

Related

C and ASCII codes: how could 128 represents a char if it's range is from -128 to 127?

I'm referring this question because I can't understand how ASCII characters from 0 to 255 can be represented with a signed char if the range of it is from -128 to 127.
Being char = sizeof(char)= 1 byte, it is also reasonable to think that it can easily represent values up to the maximum of 255;
So why the assignment: char a = 128 has nothing wrong and also why shouldn't I use unsigned char for it.
Thank you in advance!
char c = 128; by itself is correct in C. The standard says that a char contains CHAR_BIT bits, which can be greater than 8. Also, a char can be signed or unsigned, implementation defined, and an unsigned char has to contain at least the range [0, 255].
So an implementation where a char is bigger than 8 bits, or the char is unsigned by default, this line is valid and relevant.
Even in a common 8 bit signed char implementation, the expression is still well-defined in how it will convert the 128 to fit in a char, so there is no problem.
In real cases, the compiler will often issue a warning for these, clang for example :
warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion].
signed or unsigned - it takes 8bits. 8bits can contain 256 values. Just question how we use them.

Char multiplication in C

I have a code like this:
#include <stdio.h>
int main()
{
char a=20,b=30;
char c=a*b;
printf("%c\n",c);
return 0;
}
The output of this program is X .
How is this output possible if a*b=600 which overflows as char values lies between -128 and 127 ?
Whether char is signed or unsigned is implementation defined. Either way, it is an integer type.
Anyway, the multiplication is done as int due to integer promotions and the result is converted to char.
If the value does not fit into the "smaller" type, it is implementation defined for a signed char how this is done. Far by most (if not all) implementations simply cut off the upper bits.
For an unsigned char, the standard actually requires (briefly) cutting of the upper bits.
So:
(int)20 * (int)20 -> (int)600 -> (char)(600 % 256) -> 88 == 'X'
(Assuming 8 bit char).
See the link and its surrounding paragraphs for more details.
Note: If you enable compiler warnings (as always recommended), you should get a truncation warning for the assignment. This can be avoided by an explicit cast (only if you are really sure about all implications). The gcc option is -Wconversion.
First off, the behavior is implementation-defined here. A char may be either unsigned char or signed char, so it may be able to hold 0 to 255 or -128 to 127, assuming CHAR_BIT == 8.
600 in decimal is 0x258. What happens is the least significant eight bits are stored, the value is 0x58 a.k.a. X in ASCII.
This code will cause undefined behavior if char is signed.
I thought overflow of signed integer is undefined behavior, but conversion to smaller type is implementation-defined.
quote from N1256 6.3.1.3 Signed and unsigned integers:
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
If the value is simply truncated to 8 bits, (20 * 30) & 0xff == 0x58 and 0x58 is ASCII code for X. So, if your system do this and use ASCII code, the output will be X.
First, looks like you have unsigned char with a range from 0 to 255.
You're right about the overflow.
600 - 256 - 256 = 88
This is just an ASCII code of 'X'.

C: Use 8 bits of char for a full int

I'm trying to convert the 8 bits of a char to the least significant bits of an int. I know that the conversion from char to int is easy doable via
int var = (int) var2;
where var2 is of type char (or even without putting (int)).
But I wonder, if I write the code above, are the remaining highest significant (32-8=) 24 bits of the int just random or are they set to 0?
Example:
Let var2 be 00001001, if I write the code above, is var then 00000000 00000000 00000000 00001001?
C11 (n1570), § 6.5.4 Cast operators
Preceding an expression by a parenthesized type name converts the value of the
expression to the named type.
Therefore, the remaining of var are set to 0.
By the way, the explicit cast is unecessary. The conversion from char to int is implicit.
C11 (n1570), § 6.3.1.1 Boolean, characters, and integers
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
A C/C++ compiler can choose for the char type to be either signed or unsigned.
If your compiler defines char to be signed, the upper bits will be sign-extended when it is cast to an int. That is, they will either be all zeros or all ones depending on the value of the sign bit of var2. For example, if var2 has the value -1 (in hex, that's 0xff), var will also be -1 after the assignment (represented as 0xffffffff on a 32-bit machine).
If your compiler defines char to be unsigned, the upper bits will be all zero. For example, if var2 has the value 255 (again 0xff), the value of var will be 255 (0x000000ff).
I'm trying to convert the 8 bits of a char to the least significant
bits of an int
A char isn't guaranteed to be 8 bits. It might be more. Furthermore, as others have mentioned, it could be a signed integer type. Negative char values will convert to negative int values, in the following code.
int var = (int) var2;
The sign bit is considered to be the most significant, so this code doesn't do what you want it to. Perhaps you mean to convert from char to unsigned char (to make it positive), and then to int (by implicit conversion):
int var = (unsigned char) var2;
If you foresee CHAR_BIT exceeding 8 in your use case scenarios, you might want to consider using the modulo operator to reduce it:
int var = (unsigned char) var2 % 256;
But I wonder, if I write the code above, are the remaining highest
significant (32-8=) 24 bits of the int just random or are they set to
0?
Of course an assignment will assign the entire value, not just part of it.
Example: Let var2 be 00001001, if I write the code above, is var then
00000000 00000000 00000000 00001001?
Semantically, yes. The C standard requires that an int be able to store values between the range of -32767 and 32767. Your implementation may choose to represent larger ranges, but that's not required. Technically, an int is at least 16 bits. Just keep that in mind.
For values of var2 that are negative, however (eg. 10000001 in binary notation), the sign bit will be extended. var will end up being 10000000 00000001 (in binary notation).
For casting a char variable named c to an int:
if c is a positive such as 24, then c = 00011000 and after casting it will be fill up by zeros:
00000000 00000000 00000000 00011000
if c is a negative such as -24, then c = 11101000 and after casting it will be fill up by ones:
11111111 11111111 11111111 11101000

how do we get the following output?

#include <stdio.h>
int main(void)
{
int i = 258;
char ch = i;
printf("%d", ch)
}
the output is 2!
How the range of variable works? what is the range of different data types in c langauge?
When assigning to a smaller type the value is
truncated, i.e. 258 % 256 if the new type is unsigned
modified in an implementation-defined fashion if the new type is signed
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
So all that fancy "adding or subtracting" means it is assigned as if you said:
ch = i % 256;
char is 8-bit long, while 258 requires nine bits to represent. Converting to char chops off the most significant bit of 258 which is 100000010 in binary, resulting in 10, which is 2 in binary.
When you pass char to printf, it gets promoted to int, which is then picked up by the %d format specifier, and printed as 2.
#include <stdio.h>
int main(void)
{
int i = 258;
char ch = i;
printf("%d", ch)
}
Here i is 0000000100000010 on the machine level. ch takes 1 byte, so it takes last 8 bit, it is 00000010, it is 2.
In order to find out how long various types are in C language you should refer to limits.h (or climits in C++). char is not guaranteed to be 8 bits long . It is just:
smallest addressable unit of the machine that can contain basic character set. It is an integer type. Actual type can be either signed or unsigned depending on implementation
Same sort of vague definitions are put for other types.
Alternatively, you can use operator sizeof to dynamically find out size of the type in bytes.
You may not assume exact ranges of native C data types. Standard places only minimal restrictions, so you can say unsigned short can hold at least 65536 different values. Upper limit can differ
Refer to Wikipedia for more reading
char is on 8 bits so, when you cast (you assign an integer to a char), in 32 bits machine, the i (int is on 32 bits) var is:
00000000 00000000 00000001 00000010 = 258 (in binary)
When you want a char from this int, you truncate the last 8 bits (char is on 8 bits), so you get:
00000010 which mean 2 in decimal, this is why you see this output.
Regards.
This is an overflow ; the result is undefined because char may be signed (undefined behavior) or unsigned (well-defined "wrap-around" behavior).
You are using little-endian machine.
Binary representation of 258 is
00000000 00000000 00000001 00000010
while assigning integer to char, only 8 byte of data is copied to char. i.e LSB.
Here only 00000010 i.e 0x02 will be copied to char.
The same code will gives zero, in case of big-endian machine.

What does it mean for a char to be signed?

Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.
It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.
In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.
In terms of the values they represent:
unsigned char:
spans the value range 0..255 (00000000..11111111)
values overflow around low edge as:
0 - 1 = 255 (00000000 - 00000001 = 11111111)
values overflow around high edge as:
255 + 1 = 0 (11111111 + 00000001 = 00000000)
bitwise right shift operator (>>) does a logical shift:
10000000 >> 1 = 01000000 (128 / 2 = 64)
signed char:
spans the value range -128..127 (10000000..01111111)
values overflow around low edge as:
-128 - 1 = 127 (10000000 - 00000001 = 01111111)
values overflow around high edge as:
127 + 1 = -128 (01111111 + 00000001 = 10000000)
bitwise right shift operator (>>) does an arithmetic shift:
10000000 >> 1 = 11000000 (-128 / 2 = -64)
I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).
Update
Some implementation-specific behaviour mentioned in the comments:
char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.
#include <stdio.h>
int main(int argc, char** argv)
{
char a = 'A';
char b = 0xFF;
signed char sa = 'A';
signed char sb = 0xFF;
unsigned char ua = 'A';
unsigned char ub = 0xFF;
printf("a > b: %s\n", a > b ? "true" : "false");
printf("sa > sb: %s\n", sa > sb ? "true" : "false");
printf("ua > ub: %s\n", ua > ub ? "true" : "false");
return 0;
}
[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false
It's important when sorting strings.
There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.
The sign also makes a difference when passing to vararg functions:
char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);
Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.
Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.
Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.
"What does it mean for a char to be signed?"
Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)
When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.
A "signed character" happens to be perfect for this representation.
Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.
Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:
converting to a larger int
comparison functions
The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.
Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).
As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.
I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).
The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.
char a = (char)42;
char b = (char)120;
char c = a + b;
Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.
One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.

Resources