Specify a number literal as 8 bit? - c

unsigned char ascii;
int a = 0;
char string[4] = "foo"; "A1"
ascii = (string[a] - 'A' + 10) * 16;
warning: conversion to ‘unsigned char’
from ‘int’ may alter its value
It seems that gcc considers chars and number literals as int by default. I know I could just cast the expression to (unsigned char) but how can I specify char literals and number literals as 8 bit without casts ?
A similar issue:
Literal fractions are considered double by default but they can be specified to float by:
3.1f
Therefore, 3.1 would be considered a float rather than a double.

In C, you cannot do calculations in anything shorter than int
char a = '8' - '0'; /* '8' is int */
char a = (char)'8' - '0'; /* (char)'8' is converted to `int` before the subtraction */
char a = (char)'8' - (char)'0'; /* both (char)'8' and (char)'0' are converted */

The C language doesn't provide any way of specifying a literal with type char or unsigned char. Use the cast.
By the way, the result of your calculation is outside the range of unsigned char, so the warning is quite correct - conversion will alter its value. C doesn't provide arithmetic in any type smaller than an int. In this case I suppose that what you want is modulo-256 arithmetic, and I think that gcc will recognise that, and will not emit the warning with the casts in place. But as far as the C language is concerned, that calculation is done in the larger type and then converted down to unsigned char for storage in ascii.

You can specify character literals, of type charint (char in C++), with specific numbers using octal or hexadecimal notation. For example, \012 is octal 12, or decimal 10. Alternatively, you could write '\x0a' to mean the same thing.
However, even if you did this (and the calculation didn't overflow), it might not get rid of the warning, as the C language specifies that all operands are promoted to at least int (or unsigned int, depending on the operand types) before the calculation is done.

Related

How many 'char' types are there in C?

I have been reading "The C Programming Language" book by "KnR", and i've come across this statement:
"plain chars are signed or unsigned"
So my question is, what is a plain char and how is it any different from
signed char and unsigned char?
In the below code how is 'myPlainChar' - 'A' different from
'mySignChar' - 'A' and 'myUnsignChar' - 'A'?
Can someone please explain me the statement "Printable char's are
always positive".
Note: Please write examples and explain. Thank you.
{
char myChar = 'A';
signed char mySignChar = 'A';
unsigned char myUnsignChar = 'A';
}
There are signed char and unsigned char. Whether char is signed or unsigned by default depends on compiler and its settings. Usually it is signed.
There is only one char type, just like there is only one int type.
But like with int you can add a modifier to tell the compiler if it's an unsigned or a signed char (or int):
signed char x1; // x1 can hold values from -128 to +127 (typically)
unsigned char x2; // x2 can hold values from 0 to +255 (typically)
signed int y1; // y1 can hold values from -2147483648 to +2147483647 (typically)
unsigned int y2; // y2 can hold values from 0 to +4294967295 (typically)
The big difference between plain unmodified char and int is that int without a modifier will always be signed, but it's implementation defined (i.e. it's up to the compiler) if char without a modifier is signed or unsigned:
char x3; // Could be signed, could be unsigned
int y3; // Will always be signed
Plain char is the type spelled char without signed or unsigned prefix.
Plain char, signed char and unsigned char are three distinct integral types (yes, character values are (small) integers), even though plain char is represented identically to one of the other two. Which one is implementation defined. This is distinct from say int : plain int is always the same as signed int.
There's a subtle point here: if plain char is for example signed, then it is a signed type, and we say "plain char is signed on this system", but it's still not the same type as signed char.
The difference between these two lines
signed char mySignChar = 'A';
unsigned char myUnsignChar = 'A';
is exactly the same as the difference between these two lines:
signed int mySignInt = 42;
unsigned int myUnsignInt = 42;
The statement "Printable char's are always positive" means exactly what it says. On some systems some plain char values are negative. On all systems some signed char values are negative. On all systems there is a character of each kind that is exactly zero. But none of those are printable. Unfortunately the statement is not necessarily correct (it is correct about all characters in the basic execution character set, but not about the extended execution character set).
How many char types are there in C?
There is one char type. There are 3 small character types: char, signed char, unsigned char. They are collectively called character types in C.
char has the same range/size/ranking/encoding as signed char or unsigned char, yet is a distinct type.
what is a plain char and how is it any different from signed char and unsigned char?
They are 3 different types in C. A plain char char will match the same range/size/ranking/encoding as either singed char or unsigned char. In all cases the size is 1.
2 .how is myPlainChar - 'A' different from mySignChar - 'A' and myUnsignChar - 'A'?
myPlainChar - 'A' will match one of the other two.
Typically mySignChar has a value in the range [-128...127] and myUnsignChar in the range of [0...255]. So a subtraction of 'A' (typically a value of 65) will result a different range of potential answers.
Can someone please explain me the statement "Printable char's are always positive".
Portable C source code characters (the basic
execution character set) are positive so printing a source code file only prints characters of non-negative values.
When printing data with printf("%c", some_character_type) or putc(some_character_type) the value, either positive or negative is converted to an unsigned char before printing. Thus it is a character associated with a non-negative value that is printed.
C has isprint(int c) which "tests for any printing character including space". That function is only valid for values in the unsigned char range and the negative EOF. isprint(EOF) reports 0. So only non-negative values pass the isprint(int c) test.
C really has no way to print negative values as characters without undergoing a conversion to unsigned char.
I think it means char without 'unsigned' in front of it ie:
unsigned char a;
as opposed to
char a; // signed char
So basically a variable is always signed (for integers and char) unless you use the statement 'unsigned'.
That should answer the second question as well.
The third question: Characters that are in the ascii set are defined as unsigned characters, ie the number -60 doesn't represent a character, but 65 does, ie 'A'.

Why can we assign integers to a char variable

I randomly surfed on StackOverflow. As I saw a question I became clueless. Why can we assign Integer values to a char variable?
Code snippet:
#include <stdio.h>
int main()
{
char c = 130;
unsigned char f = 130;
printf("c = %d\nf = %d\n",c,f);
return 0;
}
Output:
c = -126
f = 130
I always thought values have to be assigned to the right type indentifier, why can we do that?
That's because char is an integer type (the smallest one) and values of different integer types can be implicitly converted. But beware that your example code has implementation defined behavior on a typical machine with signed 8bit char *): 130 overflows (the maximum value would be 127) and the result of overflowing a signed integer type during conversion is implementation defined.
You might have asked this question because you thought char is for storing characters. This is actuall true, but characters are numbers. See Character Encoding for more details.
*) whether char (without explicit signed or unsigned) is signed is implementation-defined, as is the number of bits, but there must be at least 8.
Quoting C11, chapter §6.5.16.1p2
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
This implies that the RHS in an assignment operator is implicitly converted to the type of the variable on the LHS. In your case, the integer constant is converted to char type.
Also, there is no char constant in C. The character constants like 'a', 'B' are all of int type.

Range of unsigned char in C language

As per my knowledge range of unsigned char in C is 0-255. but when I executed the below code its printing the 256 as output. How this is possible? I have got this code from "test your C skill" book which say char size is one byte.
main()
{
unsigned char i = 0x80;
printf("\n %d",i << 1);
}
Because the operands to <<* undergo integer promotion. It's effectively equivalent to (int)i << 1.
* This is true for most operators in C.
Several things are happening.
First, the expression i << 1 has type int, not char; the literal 1 has type int, so the type of i is "promoted" to int, and 0x100 is well within the range of a signed integer.
Secondly, the %d conversion specifier expects its corresponding argument to have type int. So the argument is being interpreted as an integer.
If you want to print the numeric value of a signed char, use the conversion specifier %hhd. If you want to print the numeric value of an unsigned char, use %hhu.
For arithmetical operations, char is promoted to int before the operation is performed. See the standard for details. Simplified: the "smaller" type is first brought to the "larger" type before the operation is performed. For the shift-operators, the resulting type is that of the left side operand, while for e.g. + and other "combining" operators it is the larger of both, but at least int. The latter means that char and short (and their unsigned counterparts are always promoted to int with the result being int, too. (simplified, for details please read the standard)
Note also that %d takes an int argument, not a char.
Additional notes:
unsigned char has not necessarily the range 0..255. Check limits.h, you will find UCHAR_MAX there.
char and "byte" are synonymously used in the standard, but neither are necessarily 8 bits wide (just very likely for modern general purpose CPUs).
As others have already explained, the statement "printf("\n %d",i << 1);" does integer promotion. So the one right shifting of integer value 128 results in 256. You could try the following code to print the maximum value of "unsigned char". The maximum value of "unsigned char" has all bits set. So a bitwise NOT operation using "~" should give you the maximum ASCII value of 255.
int main()
{
unsigned char ch = ~0;
printf("ch = %d\n", ch);
return 0;
}
Output:-
M-40UT:Desktop$ ./a.out
ch = 255

Is it safe to cast a character type to an integer type

int main() {
char ch = 'a';
int x;
x = ch;
printf("x=%c", x);
}
Is this code safe to use (considering endiness of machine)?
Yes, it is safe to cast a character (like char) type to an integer type (like int).
In this answer and others, endian-ness is not a factor.
There are 4 conversions going on here and no casting:
a is character of the C encoding. 'a' converts to an int at compile time.
'a'
The int is converted to a char.
char ch = 'a';
The char ch is converted to an int x. In theory there could be a loss of data going from char to int **, but given the overwhelming implementations, there is none. Typical examples: If char is signed in the range -128 to 127, this maps well into int. If char is unsigned in the range 0 to 255, this also maps well into int.
int x;
x = ch;
printf("%c", x) uses the int x value passed to it, converts it to unsigned char and then prints that character. (C11dr §7.21.6.1 8 #haccks) Note there is no conversion of x due to the usual conversion of variadic parameters as x is all ready an int.
printf("x=%c", x);
** char and int could be the same size and char is unsigned with a positive range more than int. This is the one potential problem with casting char to int although typically there is not loss of data. This could be further complicated should char have range like 0 to 2³²-1 and int with a range of -(2³¹-1) to +(2³¹-1). I know of no such machine.
Yes, casting integer types to bigger integer types is always safe.
Standard library's *getc (fgetc, getchar, ...) functions do just that--they read unsigned chars internally and cast them to int because int provides additional room for encoding EOF (end of file, usually EOF==-1).
Yes it is, because int is bigger than char, but using char instead of int would not be safe for the same reason.
What you are doing is first =
int x = ch => Assigning the ascii value of the char to an int
And finally :
printf("x=%c", x); => Printing the ascii value as a char, which will print the actual char that correspond to that value. So yeah it's safe to do that, it's a totally predicatable behaviour.
But safe does not mean useful as integer is bigger than char, usually we do the inverse to save some memory.
it is safe here because char is converted to int anyway when calling printf.
see C++ variadic arguments

When does conversion between unsigned and signed character pointer becomes unsafe in C?

If I do this in both clang and Visual Studio:
unsigned char *a = 0;
char * b = 0;
char x = '3';
a = & x;
b = (unsigned char*) a;
I get the warning that I am trying to convert between signed and unsigned character pointer but the code sure works. Though compiler is saying it for a reason. Can you point out a situation where this can turn into a problem?
To make it very simple because char represents:
A single character (char, it doesn't matter if signed or not). When you assign a character like 'A' what you're doing is to write A ASCII code (65) in that memory location.
A string (when used as array or pointer to a char buffer).
An eight bit number (with or without sign).
Then when you convert a signed byte like -1 to unsigned byte you'll loose information (at least sign but probably number too), that's why you get a warning:
signed char a = -1;
unsigned char b = (unsigned char)a;
if ((int)b == -1)
; // No! Now b is 255!
Value may not be 255 but 1 if your system doesn't represent negative numbers with 2's complement, in that example it doesn't really matter (and I never worked with any system like that but they exist) because the concept is a signed/unsigned conversion may discard information. It doesn't matter if this happens because of an explicit cast or a cast through pointers: bits will represent something else (and result will change according to implementation, environment and actual value).
Note that for C standard char, signed char and unsigned char are formally distinct types. You won't care (and VS will default char to signed or unsigned according to a compiler option but this isn't portable) and you may need casting.
Your code is correct (any type can be aliased by unsigned char). Also, on 2's complement systems, this alias is the same as the result of a value conversion.
The reverse operation; aliasing unsigned char by char is only a problem on esoteric systems that have trap representations for plain char.
I don't know of any such systems ever existing, although the C standard provides for their existence. Unfortunately a cast is required because of this possibility, which is more annoying than useful IMHO.
The aliasing of unsigned char by char is the same as the value conversion on every modern system that I know of (technically implementation-defined, but everyone implements it that the value conversion retains the same representation).
NB. definition of terms, taking for example unsigned char x = 250;:
alias char y = *(char *)&x;
conversion char y = x;
The char type can either be signed or unsigned depending on the platform. The code that you write with casting a char type to either unsigned or signed char might work fine within one platform, but not if the data is transferred across operating systems, ETC. See this URL:
http://www.trilithium.com/johan/2005/01/char-types/
Because you can lose some values - look at this:
unsigned char *a = 0;
char b = -3;
a = &b;
printf("%d", *a);
Result: 253
Let me explain this. Just look at ranges:
unsigned char: from 0 to 255
signed char: from -128 to 127
Edited: sorry for mistake, too hot today ;)

Resources