Char multiplication in C - c

I have a code like this:
#include <stdio.h>
int main()
{
char a=20,b=30;
char c=a*b;
printf("%c\n",c);
return 0;
}
The output of this program is X .
How is this output possible if a*b=600 which overflows as char values lies between -128 and 127 ?

Whether char is signed or unsigned is implementation defined. Either way, it is an integer type.
Anyway, the multiplication is done as int due to integer promotions and the result is converted to char.
If the value does not fit into the "smaller" type, it is implementation defined for a signed char how this is done. Far by most (if not all) implementations simply cut off the upper bits.
For an unsigned char, the standard actually requires (briefly) cutting of the upper bits.
So:
(int)20 * (int)20 -> (int)600 -> (char)(600 % 256) -> 88 == 'X'
(Assuming 8 bit char).
See the link and its surrounding paragraphs for more details.
Note: If you enable compiler warnings (as always recommended), you should get a truncation warning for the assignment. This can be avoided by an explicit cast (only if you are really sure about all implications). The gcc option is -Wconversion.

First off, the behavior is implementation-defined here. A char may be either unsigned char or signed char, so it may be able to hold 0 to 255 or -128 to 127, assuming CHAR_BIT == 8.
600 in decimal is 0x258. What happens is the least significant eight bits are stored, the value is 0x58 a.k.a. X in ASCII.

This code will cause undefined behavior if char is signed.
I thought overflow of signed integer is undefined behavior, but conversion to smaller type is implementation-defined.
quote from N1256 6.3.1.3 Signed and unsigned integers:
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
If the value is simply truncated to 8 bits, (20 * 30) & 0xff == 0x58 and 0x58 is ASCII code for X. So, if your system do this and use ASCII code, the output will be X.

First, looks like you have unsigned char with a range from 0 to 255.
You're right about the overflow.
600 - 256 - 256 = 88
This is just an ASCII code of 'X'.

Related

Int to char conversion rule in C when int is outside the range of char

Please see the following C code.
#include <stdio.h>
int main(void)
{
char c1 = 3000;
char c2 = 250;
printf("%d\n",c1);
printf("%d\n",c2);
}
The output of the above code is
-72
-6
Please explain the integer to char conversion rule applied here as both 3000 and 250 are outside of the range of char (-128 to 127).
Please explain the integer to char conversion rule applied here as both 3000 and 250 are outside the range of char(-128 to 127).
Note first that C does not specify whether char is signed or unsigned. That is left to implementations to decide, and they are not consistent on that. On implementations where char is unsigned, 250 is within its range.
Supposing, however, that your chars are signed, which indeed seems consistent with your results, the C rule for the conversions implicit in the assignment statements will not satisfy you:
the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
(C2011 6.3.1.3/3)
Evidently no signal was raised, so the result is implementation-defined. Among the possibilities is that the least-significant CHAR_BIT bits of each assigned value are stored in the target variable.
There is then an additional conversion when you call printf(). The arguments are promoted from char to int, and since int can represent all values of type char, that one is value-preserving. That allows us to conclude that it is indeed plausible that your implementation converts int to char by keeping only the least-significant bits, and interpreting them as 8-bit two's complement.
Integer uses 4 byte and char uses 1 byte. Numbers in C are represented as signed and that means first bit from the left is for sign (positive, negative) and the rest is number in full complement. So number 3000 is represent like this 00000000000000000000000010111000 in binary and for int it is stored like this. Because char is only 1 byte last 8 bits represent saved number in char variable and that is 10111000. When you convert this into decimal you will have -72.

Output of C code in which character is assigned an octal number

#include<stdio.h>
int main(void)
{
char a = 01212;
printf("%d",a);
return 0;
}
On compiling i get a warning and output -118 how? I know any number starting with 0 in c is considered as octal. The octal equivalent of 01212 is 650 then why the output is -118?
The assignment char a = 01212; on most of the systems is out of range and implementation dependent. A system with 8-bit char that implement 2's complement will print -118.
For detail, please read below explanation.
Unlike integer a char is not signed by default; there are three different char types in C.
char,
signed char
and
unsigned char
A char has a range from CHAR_MIN to CHAR_MAX. For a particular compiler, the char will use either an underlying signed or unsigned representation. You can check this value in limits.h of your system.
Here is the text from C99 standard point number 15
6.2.5 Types
The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char.35)
And again note 35 says
35) CHAR_MIN, defined in , will have one of the values 0 or SCHAR_MIN, and this can be
used to distinguish the two options. Irrespective of the choice made, char is a separate type from the
other two and is not compatible with either.
Having said this char a = 01212; is larger than 8 bit. The C standard allows char size more than 8 bit but I think almost all computers in today's world implement 8 Bit char.
So if char is implemented as unsigned char and the value is more than CHAR_MAX the value will be converted according to Modulo CHAR_MAX+1.
In 8 bit system, the converted value is 650 modulo 256 which is 650-512 = 138
If char is implemented as signed char the conversion is implementation dependent. If it's an 8-bit char system and it implements 2's complement the value will be -118 as you have seen in your result. Note that in this system the Range for char will be from -128 to +127.
The value of 650 is most likely out-of-range for your char type. In C the behavior is implementation-defined in such cases as out-of-range integer conversions. I.e. it is clear that you will not get 650 in your char, and what exactly you will get depends on your compiler. Consult your compiler documentation to figure out why you got -118.
Char is only occupying one byte, or 8 bits, so the maximum number a unsigned char can hold is 2^8 - 1, which is 255, and a signed char has a maximum of 127. When assigned a number that's greater than that, it would cause undefined behavior, in which a negative number may appear.

Execution output of program

Why does the below given program prints -128
#include <stdio.h>
main()
{
char i = 0;
for (; i >= 0; i++)
;
printf("%d",i);
}
Also can I assign int value to char without type-casting it. And if I used a print statement in for loop it prints till 127 which is correct but this program current prints -128. Why
If char is a signed type on your platform then the behaviour of the program is undefined: overflowing a signed type is undefined behaviour in C.
A 2's complement 8 bit number with a value of -128 has the same bit pattern as a unsigned 8 bit number with value +128. It seems that this is what is happening in your case. And -128 is, of course, a termination condition for your loop. (You could even call it "wraparound to the smallest negative"). But don't rely on this.
According to N1570, whether a char is signed is implementation-defined:(Emphasis mine)
6.2.5 Types
15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char.
If it's unsigned, it will never overflow:
9 The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same. A
computation involving unsigned operands can never overflow, because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting type.
For example, suppose UCHAR_MAX == 127(usually it'll be 255, though), 127 + 1 = (127 + 1) % (UCHAR_MAX + 1) = (127 + 1) % (127 + 1) = 0.
But if it's signed, the behavior is undefined, which means anything can happen. CHAR_MAX + 1 can be equal to CHAR_MIN, 0, or whatever. What's more, "undefined behavior" indicates that the program is possible to crash, although it's not very likely in practice.
In your case, it seems that char is signed, and CHAR_MAX + 1 == CHAR_MIN. Why? Just because your implementation defined so, and your are lucky enough to miss a crash this time. But this is not portable and reliable at all.

Printing declared char value in C

I understand that character variable holds from (signed)-128 to 127 and (unsigned)0 to 255
char x;
x = 128;
printf("%d\n", x);
But how does it work? Why do I get -128 for x?
printf is a variadic function, only providing an exact type for the first argument.
That means the default promotions are applied to the following arguments, so all integers of rank less than int are promoted to int or unsigned int, and all floating values of rank smaller double are promoted to double.
If your implementation has CHAR_BIT of 8, and simple char is signed and you have an obliging 2s-complement implementation, you thus get
128 (literal) to -128 (char/signed char) to -128 (int) printed as int => -128
If all the listed condition but obliging 2s complement implementation are fulfilled, you get a signal or some implementation-defined value.
Otherwise you get output of 128, because 128 fits in char / unsigned char.
Standard quote for case 2 (Thanks to Matt for unearthing the right reference):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
This all has nothing to do with variadic functions, default argument promotions etc.
Assuming your system has signed chars, then x = 128; is performing an out-of-range assignment. The behaviour of this is implementation-defined ; meaning that the compiler may choose an action but it must document what it does (and therefore, do it reliably). This action is allowed to include raising a signal.
The usual behaviour that modern compilers do for out-of-range assignment is to truncate the representation of the value to fit in the destination type.
In binary representation, 128 is 000....00010000000.
Truncating this into a signed char gives the signed char of binary representation 10000000. In two's complement representation, which is used by all modern C systems for negative numbers, this is the representation of the value -128. (For historical curiousity: in one's complement this is -127, and in sign-magnitude, this is -0 which may be a trap representation and thus raise a signal).
Finally, printf accurately prints out this char's value of -128. The %d modifier works for char because of the default argument promotions and the facts that INT_MIN <= CHAR_MIN and INT_MAX >= CHAR_MAX.; this behaviour is guaranteed except on systems which have plain char as unsigned, and sizeof(int)==1 (which do exist but you'd know about it if you were on one).
Lets look at the binary representation of 128 when stored into 8 bits:
1000 0000
And now let's look at the binary representation of -128 when stored into 8 bits:
1000 0000
The standard for char with your current setup looks to be a signed char (note this isn't in the c standard, look here if you don't believe me) and thus when you're assigning the value of 128 to x you're assigning it the value 1000 0000 and thus when you compile and print it out it's printing out the signed value of that binary representation (meaning -128).
It turns out my environment is the same in assuming char is actually signed char. As expected if I cast x to be an unsigned char then I get the expected output of 128:
#include <stdio.h>
#include <stdlib.h>
int main() {
char x;
x = 128;
printf("%d %d\n", x, (unsigned char)x);
return 0;
}
gives me the output of -128 128
Hope this helps!

What does it mean for a char to be signed?

Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.
It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.
In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.
In terms of the values they represent:
unsigned char:
spans the value range 0..255 (00000000..11111111)
values overflow around low edge as:
0 - 1 = 255 (00000000 - 00000001 = 11111111)
values overflow around high edge as:
255 + 1 = 0 (11111111 + 00000001 = 00000000)
bitwise right shift operator (>>) does a logical shift:
10000000 >> 1 = 01000000 (128 / 2 = 64)
signed char:
spans the value range -128..127 (10000000..01111111)
values overflow around low edge as:
-128 - 1 = 127 (10000000 - 00000001 = 01111111)
values overflow around high edge as:
127 + 1 = -128 (01111111 + 00000001 = 10000000)
bitwise right shift operator (>>) does an arithmetic shift:
10000000 >> 1 = 11000000 (-128 / 2 = -64)
I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).
Update
Some implementation-specific behaviour mentioned in the comments:
char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.
#include <stdio.h>
int main(int argc, char** argv)
{
char a = 'A';
char b = 0xFF;
signed char sa = 'A';
signed char sb = 0xFF;
unsigned char ua = 'A';
unsigned char ub = 0xFF;
printf("a > b: %s\n", a > b ? "true" : "false");
printf("sa > sb: %s\n", sa > sb ? "true" : "false");
printf("ua > ub: %s\n", ua > ub ? "true" : "false");
return 0;
}
[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false
It's important when sorting strings.
There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.
The sign also makes a difference when passing to vararg functions:
char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);
Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.
Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.
Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.
"What does it mean for a char to be signed?"
Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)
When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.
A "signed character" happens to be perfect for this representation.
Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.
Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:
converting to a larger int
comparison functions
The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.
Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).
As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.
I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).
The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.
char a = (char)42;
char b = (char)120;
char c = a + b;
Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.
One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.

Resources