Relationship between char and ASCII Code? - c

My computer science teacher taught us that which data type to declare depends on the size of the value for a variable you need. And then he demonstrated having a char add and subtract a number to output a different char. I remember he said this is something to do with ASCII Code. Can anyone explain this more specifically and clearly ? So, is char considerd as a number(since we can do math with it ) or a character or both? Can we print out the number behind a char?how?

So, is char considerd as a number or a character or both?
Both. It is an integer, but that integer value represents a character, as described by the character encoding of your system. The character encoding of the system that your computer science teacher uses happens to be ASCII.
Can we print out the number behind a char?how?
C++ (as the question used to be tagged):
The behaviour of the character output stream (such as std::cout) is to print the represented character when you insert an integer of type char. But the behaviour for all other integer types is to print the integer value. So, you can print the integer value of a char by converting it to another integer type:
std::cout << (unsigned)'c';
C:
There are no templated output streams, so you don't need to do explicit conversion to another integer (except for the signedness). What you need is the correct format specifier for printf:
printf("%hhu", (unsigned char)'c');
hh is for integer of size char, u is to for unsigned as you probably are interested in the unsigned representation.

A char can hold a number, it's the smallest integer type available on your machine and must have at least 8 bits. It is synonymous to a byte.
It's typical use is to store the codes of characters. Computers can only deal with numbers, so, to represent characters, numbers are used. Of course you must agree on which number means which character.
C doesn't require a specific character encoding, but most systems nowadays use a superset of ASCII (this is a very old encoding using only 7 bits) like e.g. UTF-8.
So, if you have a char that holds a character and you add or subtract some value, the result will be another number that happens to be the code for a different character.
In ASCII, the characters 0-9, a-z and A-Z have adjacent code points, therefore by adding e.g. 2 to A, the result will be C.
Can we print out the number behind a char?
Of course. It just depends whether you interpret the value in the char as just a number or as the code of a character. E.g. with printf:
printf("%c\n", 'A'); // prints the character
printf("%hhu\n", (unsigned char)'A'); // prints the number of the code
The cast to (unsigned char) is only needed because char is allowed to be either signed or unsigned, we want to treat it as unsigned here.

A char takes up a single byte. On systems with an 8 bit byte this gives it a range (assuming char is signed) of -128 to 127. You can print this value as follows:
char a = 65;
printf("a=%d\n", a);
Output:
65
The %d format specifier prints its argument as a decimal integer. If on the other hand you used the %c format specifier, this prints the character associated with the value. On systems that use ASCII, that means it prints the ASCII character associated with that number:
char a = 65;
printf("a=%c\n", a);
Output:
A
Here, the character A is printed because 65 is the ASCII code for A.
You can perform arithmetic on these numbers and print the character for the resulting code:
char a = 65;
printf("a=%c\n", a);
a = a + 1;
printf("a=%c\n", a);
Output:
A
B
In this example we first print A which is the ASCII character with code 65. We then add 1 giving us 66. Then we print the ASCII character for 66 which is B.

Every variable is stored in binary (i.e as a number,) chars, are just numbers of a specific size.
They represent a character when encoded using some character encoding, the ASCII standard (www.asciitable.com) is here.
As in the #Igor comment, if you run the following code; you see the ASCII character, Decimal and Hexadecimal representation of your char.
char c = 'A';
printf("%c %d 0x%x", c, c, c);
Output:
A 65 0x41
As an exercise to understand it better, you could make a program to generate the ASCII Table yourself.

My computer science teacher taught us that which data type to declare depends on the size of the value for a variable you need.
This is correct. Different types can represent different ranges of values. For reference, here are the various integral types and the minimum ranges they must be able to represent:
Type Minimum Range
---- -------------
signed char -127...127
unsigned char 0...255
char same as signed or unsigned char, depending on implementation
short -32767...32767
unsigned short 0...65535
int -32767...32767
unsigned int 0...65535
long -2147483647...2147483647
unsigned long 0...4294967295
long long -9223372036854775807...9223372036854775807
unsigned long long 0...18446744073709551615
An implementation may represent a larger range in a given type; for example, on most modern implementations, the range of an int is the same as the range of a long.
C doesn't mandate a fixed size (bit width) for the basic integral types (although unsigned types are the same size as their signed equivalent); at the time C was first developed, byte and word sizes could vary between architectures, so it was easier to specify a minimum range of values that the type had to represent and leave it to the implementor to figure out how to map that onto the hardware.
C99 introduced the stdint.h header, which defines fixed-width types like int8_t (8-bit), int32_t (32-bit), etc., so you can define objects with specific sizes if necessary.
So, is char considerd as a number(since we can do math with it ) or a character or both?
char is an integral data type that can represent values in at least the range [0...127]1, which is the range of encodings for the basic execution character set (upper- and lowercase Latin alphabet, decimal digits 0 through 9, and common punctuation characters). It can be used for storing and doing regular arithmetic on small integer values, but that's not the typical use case.
You can print char objects out as a characters or numeric values:
#include <limits.h> // for CHAR_MAX
...
printf( "%5s%5s\n", "dec", "char" );
printf( "%5s%5s\n", "---", "----" );
for ( char i = 0; i < CHAR_MAX; i++ )
{
printf("%5hhd%5c\n", i, isprint(i) ? i : '.' );
}
That code will print out the integral value and the associated character, like so (this is ASCII, which is what my system uses):
...
65 A
66 B
67 C
68 D
69 E
70 F
71 G
72 H
73 I
...
Control characters like SOH and EOT don't have an associated printing character, so for those value the code above just prints out a '.'.
By definition, a char object takes up a single storage unit (byte); the number of bits in a single storage unit must be at least 8, but could be more.
Plain char may be either signed or unsigned depending on the implementation so it can represent additional values outside that range, but it must be able to represent *at least* those values.

Related

If I want to store an integer into a char type variable, which byte of the integer will be stored?

int a = 0x11223344;
char b = (char)a;
I am new to programming and learning C. Why do I get value of b here as D?
If I want to store an integer into a char type variable, which byte of the integer will be stored?
This is not fully defined by the C standard.
In the particular situation you tried it, what likely happened is that the low eight bits of 0x11223344 were stored in b, producing 4416 (6810) in b, and printing that prints “D” because your system using ASCII character codes, and 68 is the ASCII code for “D”.
However, you should be wary of something like this working, because it is contingent on several things, and variations are possible.
First, the C standard allows char to be signed or unsigned. It also allows char to be any width that is eight bits or greater. In most C implementations today, it is eight bits.
Second, the conversion from int to char depends on whether char is signed or unsigned and may not be fully defined by the C standard.
If char is unsigned, then the conversion is defined to wrap modulo M+1, where M is the largest value representable in char. Effectively, this is the same as taking the low byte of the value. If the unsigned char has eight bits, its M is 255, so M+1 is 256.
If char is signed and the value is out of range of the char type, the conversion is implementation-defined: It may either trap or produce an implementation-defined value. Your C implementation may wrap conversions to signed integer types similarly to how it wraps conversions to unsigned types, but another reasonable behavior is to “clamp” out-of-range values to the limits of the type, CHAR_MIN and CHAR_MAX. For example, converting −8000 to char could yield the minimum, −128, while converting 0x11223344 to char could yield the maximum, +127.
Third, the C standard does not require implementations to use ASCII. It is very common to use ASCII. (Usually, the character encoding is not just ASCII, because ASCII covers only values from 0 to 127. C implementations often use some extension beyond ASCII for values from 128 to 255.)

The size of a char and an int in C

In the C programming language, if an int is 4 bytes and letters are represented in ASCII as a number (also an int), then why is a char 1 byte?
A char is one byte because the standard says so. But that's not really what you are asking. In terms of the decimal values of a char it can hold from -128 to 127, have a look at a table for ASCII character codes, you'll notice that the decimal values of those codes are between 0 and 127, hence, they fit in positive values of a char. There are extended character sets that use unsigned char, values from 0 to 255.
6.2.5 Types
...
3 An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.
...
5 An object declared as type signed char occupies the same amount of storage as a ‘‘plain’’ char object. A ‘‘plain’’ int object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range INT_MIN to INT_MAX as defined in the header <limits.h>).
C 2012 Online Draft
Type sizes are not defined in terms of bits, but in terms of the range of values that must be represented.
The basic execution character set consists of 96 or so characters (26 uppercase Latin characters, 26 lowercase latin characters, 10 decimal digits, 29 graphical characters, space, vertical tab, horizontal tab, line feed, form feed); 8 bits is more than sufficient to represent those.
int, OTOH, must be able to represent a much wider range of values; the minimum range as specified in the standard is [-32767..32767]1, although on most modern implementations it’s much wider.
The standard doesn’t assume two’s complement representation of signed integers, which is why INT_MIN is -32767 and not -32768.
In the C language, a char usually has a size of 8 bits.
In all the compilers that I have seen (which are, admittedly, not very many), the char is taken to be large enough to hold the ASCII character set (or the so called “extended ASCII”) and the size of the char data type is 8 bits (this includes compilers in major Desktop platforms, and a some embedded systems).
1 byte was sufficient to represent the whole character set.

Difference between char and int when declaring character

I just started learning C and am rather confused over declaring characters using int and char.
I am well aware that any characters are made up of integers in the sense that the "integers" of characters are the characters' respective ASCII decimals.
That said, I learned that it's perfectly possible to declare a character using int without using the ASCII decimals. Eg. declaring variable test as a character 'X' can be written as:
char test = 'X';
and
int test = 'X';
And for both declaration of character, the conversion characters are %c (even though test is defined as int).
Therefore, my question is/are the difference(s) between declaring character variables using char and int and when to use int to declare a character variable?
The difference is the size in byte of the variable, and from there the different values the variable can hold.
A char is required to accept all values between 0 and 127 (included). So in common environments it occupies exactly
one byte (8 bits). It is unspecified by the standard whether it is signed (-128 - 127) or unsigned (0 - 255).
An int is required to be at least a 16 bits signed word, and to accept all values between -32767 and 32767. That means that an int can accept all values from a char, be the latter signed or unsigned.
If you want to store only characters in a variable, you should declare it as char. Using an int would just waste memory, and could mislead a future reader. One common exception to that rule is when you want to process a wider value for special conditions. For example the function fgetc from the standard library is declared as returning int:
int fgetc(FILE *fd);
because the special value EOF (for End Of File) is defined as the int value -1 (all bits to one in a 2-complement system) that means more than the size of a char. That way no char (only 8 bits on a common system) can be equal to the EOF constant. If the function was declared to return a simple char, nothing could distinguish the EOF value from the (valid) char 0xFF.
That's the reason why the following code is bad and should never be used:
char c; // a terrible memory saving...
...
while ((c = fgetc(stdin)) != EOF) { // NEVER WRITE THAT!!!
...
}
Inside the loop, a char would be enough, but for the test not to succeed when reading character 0xFF, the variable needs to be an int.
The char type has multiple roles.
The first is that it is simply part of the chain of integer types, char, short, int, long, etc., so it's just another container for numbers.
The second is that its underlying storage is the smallest unit, and all other objects have a size that is a multiple of the size of char (sizeof returns a number that is in units of char, so sizeof char == 1).
The third is that it plays the role of a character in a string, certainly historically. When seen like this, the value of a char maps to a specified character, for instance via the ASCII encoding, but it can also be used with multi-byte encodings (one or more chars together map to one character).
Size of an int is 4 bytes on most architectures, while the size of a char is 1 byte.
Usually you should declare characters as char and use int for integers being capable of holding bigger values. On most systems a char occupies a byte which is 8 bits. Depending on your system this char might be signed or unsigned by default, as such it will be able to hold values between 0-255 or -128-127.
An int might be 32 bits long, but if you really want exactly 32 bits for your integer you should declare it as int32_t or uint32_t instead.
I think there's no difference, but you're allocating extra memory you're not going to use. You can also do const long a = 1;, but it will be more suitable to use const char a = 1; instead.

Dealing with char values over 127 in C

I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d.But if I try to print a character from its numerical value like this:
printf("%c\n", 128);
it just prints FFFD (the replacement character).Here is another example:
char abc[] = {126, 128, '\0'}; // Manually assigning values
printf("%c", abc[0]); // Prints "~", as expected
printf("%c", 121); // Prints "y"
pritf("%c", abc[1]; // Should print "€", I think, but I get "�"
I'm a bit confused since I can just print every character below 128 in these ways.The reason I'm asking this, is because I need to generate a (pseudo)random byte sequence using the rand() function.Here is an example:
char abc[10];
srand(time(NULL));
abc[0] = rand() % 256; // Gives something between 00:FF ...
printf("%c", abc[0]); // ... but I get "�"
If this is of any help, the source code is encoded in UTF-8, but changing encoding doesn't have any effect.
In C, a char is a different type than unsigned char and signed char. It has the range CHAR_MIN to CHAR_MAX. Yet it has the same range as one of unsigned char/signed char. Typically these are 8-bit types, but could be more. See CHAR_BIT. So the typical range is [0 to 255] or [-128 to 127]
If char is unsigned, abc[1] = 128 is fine. If char is signed, abc[1] = 128 is implementation-defined (see below). The typical I-D is the abc[1] will have the value of -128.
printf("%c\n", 128); will send the int value 128 to printf(). The "%c" will cast that value to an unsigned char. So far no problems. What appears on the output depends on how the output device handles code 128. Perhaps Ç, perhaps something else.
printf("%c", abc[1]; will send 128 or is I-D. If I-D and -128 was sent, then casting -128 to unsigned char is 128 and again the code for 128 is printed.
If the output device is expecting UTF8 sequences, a UTF8 sequence beginning with code 128 is invalid (it is an unexpected continuation byte) and many such systems will print the replacement character which is unicode FFFD.
Converting a value outside the range of of a signed char to char invokes:
the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3
First of all, let me tell you, signed-ness of a char is implementation defined.
If you have to deal with char values over 127, you can use unsigned char. It can handle 0-255.
Also, you should be using %hhu format specifier to print the value of an unsigned char.
If you're dealing with bytes, use unsigned char instead of char for your datatypes.
With regard to printing, you can print the bytes in hex instead of decimal or as characters:
printf("%02X", abc[0]);
You probably don't want to print these bytes as characters, as you'll most likely be dealing with UTF-8 character encoding which doesn't seem to be what you're looking for.

Understanding printf better - What does it print with "%c" when the value provided is negative?

In Kernighan & Ritchie, it says that "all printable characters are positive when though char datatype being signed or unsigned is machine-dependent."
Can somebody explain to me the meaning of this line ? My system has signed chars but even with a negative value say of -90, printf does print a character (even though its not a very familiar character).
ASCII character set defines codepoints from 0x00 to 0x7F. It doesn't matter if they are represented with unsigned or signed byte values since this range is common for both.
Printable characters are between 0x20 and 0x7E, which are all part of the ASCII. The term printable character does not define every possible character in the world that is printable. Rather it is defined inside the realm of ASCII.
Byte values from 0x80 to 0xFF are not defined in ASCII and different systems assign different characters to values in this range resulting in many different types of codepages which are identical in their ASCII range but differ in this range. This is also the range where values for signed and unsigned bytes differ.
The implementation of printf looks for a single byte value when it encounters a %c key in its input. This byte value may be signed or unsigned with respect to your point of view as the caller of printf function but printf does not know this. It just passes these 8bits to the output stream it's connected to and that stream emits characters within 0x00 and 0xff.
The concept of sign has no meaning inside the output pipeline where characters are emitted. Thus, whether you send a 255 or a -1, the character mapped to 0xFF in the specific codepage is emitted.
-90 as a signed char is being re-interpreted as an unsigned char, in which case it's value is 166. (Both -90 and 166 are 0xA6 in hex.)
That's right. All binary numbers are positive. Whether you treat it as negative or not is your own interpretation. Using the common two's compliment.
The 8-bit number: 10100110 is positive 166, which is greater that 128 (The maximum positive signed 8 bit number).
Using signed arithmatic the number 166 is -90.
You are seeing the character whose ascii value is 166.
Using this as an example:
signed char x = -90;
printf("%c", x);
The integer promotion rules convert x into an int before passing it as an argument to printf. (Note, none of the other answers mention this detail, and several imply the argument to printf is still a signed char).
Section 7.21.6.1.6 of the standard (I'm using the C11 standard) says of the %c flag character:
If no l length modifier is present, the int argument is converted to
an unsigned char, and the resulting character is written.
So the integer -90 gets converted into an unsigned char. That means (6.3.1.3.2):
...the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type until the value is
in the range of the new type.
If an unsigned char on your system takes the values 0 to 255 (which it almost certainly does), then the result will be -90 + 256 = 166. (Note: other answers refer to the "lowest byte" or "hex representation" assuming two's complement representation. Although this is overwhelmingly common, the C standard does not guarantee it).
The character 166 is then written to stdout, and interpreted by your terminal.

Resources