I am new in programming Language and I need your help here.
I am here studying someone's code and I can across with these expressions, my doubt is how is the operation done here, given that a character and an integers are two different data types?
How will the integer type hold the character value?
Thanks
int line, col;
char ch;
scanf("%d%c", &line, &ch);
//line--;
col = ch - 'A';
my doubt is how is the operation done here, given that a character and an integers are two different data types?
I'm unsure how well this question will be received here, given it being about some fairly basic behavior of the language, but I commend you for thinking about type and type matching. Keep doing that!
The first thing to understand is that in C, char and its signed and unsigned variants are among the integer data types, so there is no mismatch of type category, just a question of possibly-different range and signedness. Characters are represented by integer codes (as, indeed, is pretty much everything in the computer's memory).
The second thing to understand is that C supports all manner of arithmetic operations on operands of mixed types. It defines a set of "usual arithmetic conversions" that are used to choose a common type for the operands and the result of each arithmetic operation. The operands are automatically converted to that type. I won't cover all the details here, but basically, floating-point types win over integer types, and wider types win over narrower types.
The third thing to understand is that C does not in any case directly define arithmetic on integer types narrower than (technically, having integer conversion rank less than that of) int. When a narrower value appears in an arithmetic expression, it is automatically converted to int (if int can represent all values of the original type) or to unsigned int. These automatic conversions are called the "integer promotions", and they are a subset of the usual arithmetic conversions.
A fourth thing that is sometimes important to know is that in C "integer character constants" such as 'A' have type int, not type char (C++ differs here).
So, to evaluate this ...
col = ch - 'A';
... the usual arithmetic conversions are first applied to ch and 'A'. This involves performing the integer promotions on the value of ch, resulting in the same numeric value, but as an int. The constant 'A' already has type int, so these now match, and their difference can be computed without any further conversions. The result is an int, which is the same type as col, so no conversion is required to assign the result, either.
How will the integer type hold the character value?
Character values are integer values. Type int can accommodate all values that type char can accommodate.* Nothing special is happening in that regard.
*Technically, int can accommodate all values that can be represented by signed char, unsigned int can accommodate all values that can be represented by type unsigned char, and at least one of the two can accommodate all values that can be represented by (default) char. You are fairly unlikely to run across a C implementation where there are char values that int cannot accommodate, and the above assumes that you are not working with such an implementation, but these are allowed and some may exist.
At the fundamental level, every type in C (be it char, int, uint32_t, short, long...) is represented by bytes, and is 'numerical' in form. You can subtract them from each other / add them together in whichever combination you like - as long as you store the resulting value in a variable of a type which is big enough to hold it - otherwise it will cause a buffer overflow.
In your example, since a char type is represented by a single byte, and an int is composed of 8, the result of this subtraction will simply be stored in the right-most byte of an int (however, depending on if you're dealing with an expression which will yield a negative value, then the representation of the int in memory will be slightly different - look into 2's complement if you're interested).
When you subtract two characters and put them in a variable of integer type, in fact the ASCII code of the two characters is subtracted.
For example when you have:
int col = 'D' - 'A';
The value of col is equal to 3
Because ascii code of D is equal to 68 and ascii code of A is 65. So col is 3, however D & A were character.
Also you can see here
Related
6.4.4.4/10 ...If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
I'm having trouble understanding this paragraph. After this paragraph standard gives the example below:
Example 2: Consider implementations that use two’s complement representation for
integers and eight bits for objects that have type char. In an
implementation in which type char has the same range of values as
signed char, the integer character constant '\xFF' has the value −1;
if type char has the same range of values as unsigned char, the
character constant '\xFF' has the value +255.
What i understand from the expression: "value of an object with type char" is the value we get when we interpret the object's content with type char. But when we look to the example it's like talking about the object's value with pure binary notation. Is my understanding wrong? Does an object's value mean the bits in that object always?
All "integer character constants" (the stuff between ' and ') have type int out of tradition and compatibility reasons. But they are mostly meant to be used together with char, so 6.4.4.4/10 needs to make a distinction between the types. Basically patch up the broken C language - we have cases such as *"\xFF" that results in type char but '\xFF' results in type int, which is very confusing.
The value '\xFF' = 255 will always fit in an int on any implementation, but not necessarily in a char, which has implementation-defined signedness (another inconsistency in the language). The behavior of the escape sequence should be as if we stored the character constant in a char, as done in my string literal example *"\xFF".
This need for consistency with char type even though the value is stored in an int is what 6.4.4.4/10 describes. That is, printf("%d", '\xFF'); should behave just as char ch = 255; printf("%d", (int)ch);
The example is describing one possible implementation, where char is either signed or unsigned and the system uses 2's complement. Generally the value of an object with integer type refers to decimal notation. char is an integer type, so it can have a negative decimal value (if the symbol table has a matching index for the value -1 or not is another story). But "raw binary" cannot have a negative value, 1111 1111 can only be said to be -1 if you say the the memory cell should be interpreted as 8 bit 2's complement. That is, if you know that a signed char is stored there. If you know that an unsigned char is stored there, then the value is 255.
What's the difference ? Both of them give same output while using printf("%ld")
long x = 1024;
long y = 1024L;
In C source code, 1024 is an int, and 1024L is a long int. During an assignment, the value on the right is converted to the type of the left operand. As long as the rules about which combinations of operands are obeyed and the value on the right is in the range of the left operand, there is no difference—the value remains unchanged.
In general, a decimal constant without a suffix is an int, and a decimal constant with an L is a long int. However, if its value is too big to be represented in the usual type, it will automatically be the next larger type. For example, in a C implementation where the maximum int is 2147483647, the constant 3000000000 in source code will be a long int even though it has no suffix. (Note that this rule means the same constant in source code can have different types in different C implementations.) If a long int is not big enough, it will be long long int. If that is not big enough, it can be a signed extended integer type, if the implementation supports one.
The rules above are for decimal constants. There are also hexadecimal constants (which begin with 0x or 0X) and octal constants (which begin with 0—020 is octal for sixteen, unlike 20 which is decimal for twenty), which may have signed or unsigned types. The different integer types are important because overflow and conversions behave differently depending on type. It is easy to take integer operations as a matter of course and assume they work, but it important to learn the details to avoid problems.
#include<stdio.h>
void main()
{
int a=65;
char d='A';
if(a==d)
printf("both are same");
}
The output is both are same.here a is a integer so 65 is stored in 32 bits and d is a char which is stored in 8 bits how could they be same as is computer everything is converted to binary for any operation.
The computer is able to compare a char to an int on a binary level because of Implicit type promotion rules.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
This means your char is promoted to an int before your processor compares the two.
C is a very flawed language, so there are many dirty, irrational things going on between the lines here:
char has implementation-defined signedness, so how it stores data depends on compiler. Is char signed or unsigned by default?
'A' is a character literal, and as it happens, character literals are actually of type int in C. This doesn't make any sense, but that's just how it is.
In the line char d='A';, the literal 'A' (type int) gets converted to char. Which may or may not be signed. Signedness shouldn't in practice affect the basic character set A to Z though.
Most likely 'A' will be stored as the value 65, although this is not guaranteed by the standard. For that reason it is better to always write 'A' and never 65 (the former is also most readable).
In the expression a==d, the character operand is a small integer type. Small integer types undergo an implicit promotion to int when used in most expressions. This integer promotion is part of a set of rules for how expressions are balanced, to ensure that both operands of an operator are always of the same type. These rules are called the usual arithmetic conversions. For details see: Implicit type promotion rules
The internal storage is the compiler's decision, and often depends on the target architecture.
However, this has nothing to do with the result your code shows; in the comparison, the char gets promoted to an int before comparing (because you can't compare apples with oranges; read the language rules). Therefore, it compares an int with an int, and they are equal.
When you cast a character to an int in C, what exactly is happening? Since characters are one byte and ints are four, how are you able to get an integer value for a character? Is it the bit pattern that is treated as a number. Take for example the character 'A'. Is the bit pattern 01000001 (i.e 65 in binary)?
char and int are both integer types.
When you convert a value from any arithmetic (integer or floating-point) type to another arithmetic type, the conversion preserves the value whenever possible. Arithmetic conversions are always defined in terms of values, not representations (though some of the rules are designed to be simply implemented on most hardware).
In your case, you might have:
char c = 'A';
int i = c;
c is an object of type char with the value 65 (assuming an ASCII representation). The conversion from char to int yields an int with the value 65. The compiler generates whatever code is necessary to make that happen; in terms of representation, it could either sign-extend or pad with 0 bits.
This applies when the value of the source expression can be represented as a value of the target type. For a char to int conversion, that's (almost) always going to be the case. For some other conversions, there are various rules for what to do when the value won't fit:
For any conversion to or from floating-point, if the value is out of range the behavior is undefined ((int)1.0e100 may yield some arbitrary value or it can crash your program), and if it's within range but inexact it's approximated by rounding or truncation;
For conversion of a signed or unsigned integer to an unsigned integer, the result is wrapped (unsigned)-1 == UINT_MAX);
For conversion of a signed or unsigned integer to a signed integer, the result is implementation-defined (wraparound semantics are common) -- or an implementation-defined signal can be raised.
(Floating-point conversions also have to deal with precision.)
Other than converting integers to unsigned types, you should generally avoid out-of-range conversions.
Incidentally, though int may happen to be 4 bytes on your system, it could be any size as long as it's able to represent values from -32767 to +32767. The ranges of the various integer types, and even the number of bits in a byte, are implementation-defined (with some restrictions imposed by the standard). 8-bit bytes are almost universal. 32-bit int is very common, though older systems commonly had 16-bit int (and I've worked on systems with 64-bit int).
Suppose that we write in C the following character constant:
'\xFFFFAA'
Which is its numerical value?
The standard C99 says:
Character constants have type int.
Hexadecimal character constants can be represented as an unsigned char.
The value of a basic character constant is non-negative.
The value of any character constant fits in the range of char.
Besides:
The range of values of signed char is contained in the range of values of int.
The size (in bits) of char, unsigned char and signed char are the same: 1 byte.
The size of a byte is given by CHAR_BIT, whose value is at least 8.
Let's suppose that we have the typical situation with CHAR_BIT == 8.
Also, let's suppose that char is signed char for us.
By following the rules: the constant '\xFFFFAA' has type int, but its value can be represented in an unsigned char, althoug its real value fits in a char.
From these rules, an example as '\xFF' would give us:
(int)(char)(unsigned char)'\xFF' == -1
The 1st cast unsigned char comes from the "can be represented as unsigned char" requirement.
The 2nd cast char comes from the "the value fits in a char" requirement.
The 3rd cast int comes from the "has type int" requirement.
However, the constant '\xFFFFAA' is too big, and cannot be "represented" as unsigned int.
Wich is its value?
I think that the value is the resulting of (char)(0xFFFFAA % 256) since the standard says, more or less, the following:
For unsigned integer types, if a value is bigger that the maximum M that can be represented by the type, the value is the obtained after taking the remainder modulo M.
Am I right with this conclusion?
EDIT I have convinced by #KeithThompson: He says that, according to the standards, a big hexadecimal character constant is a constraint violation.
So, I will accept that answer.
However: For example, with GCC 4.8, MinGW, the compiler triggers a warning message, and the program compiles following the behaviour I have described. Thus, it was considered valid a constant like '\x100020' and its value was 0x20.
The C standard defines the syntax and semantics in section 6.4.4.4. I'll cite the N1570 draft of the C11 standard.
Paragraph 6:
The hexadecimal digits that follow the backslash and the letter x in a
hexadecimal escape sequence are taken to be part of the construction
of a single character for an integer character constant or of a single
wide character for a wide character constant. The numerical value of
the hexadecimal integer so formed specifies the value of the desired
character or wide character.
Paragraph 9:
Constraints
The value of an octal or hexadecimal escape sequence shall be in the
range of representable values for the corresponding type:
followed by a table saying that with no prefix, the "corresponding type" is unsigned char.
So, assuming that 0xFFFFAA is outside the representable range for type unsigned char, the character constant '\xFFFFAA' is a constraint violation, requiring a compile-time diagnostic. A compiler is free to reject your source file altogether.
If your compiler doesn't at least warn you about this, it's failing to conform to the C standard.
Yes, the standard does say that unsigned types have modular (wraparound) semantics, but that only applies to arithmetic expressions and some conversions, not to the meanings of constants.
(If CHAR_BIT >= 24 on your system, it's perfectly valid, but that's rare; usually CHAR_BIT == 8.)
If a compiler chooses to issue a mere warning and then continue to compile your source, the behavior is undefined (simply because the standard doesn't define the behavior).
On the other hand, if you had actually meant 'xFFFFAA', that's not interpreted as hexadecimal. (I see it was merely a typo, and the question has been edited to correct it, but I'm going to leave this here anyway.) Its value is implementation-defined, as described in paragraph 10:
The value of an integer character constant containing more than one
character (e.g.,
'ab'), ..., is implementation-defined.
Character constants containing more than one character are a nearly useless language feature, used by accident more often than they're used intentionally.
Yes, the value of \xFFFFAA should be representable by unsigned char.
6.4.4.4 9 Constraints
The value of an octal or hexadecimal escape sequence shall be in the
range of representable values for the type unsigned char for an
integer character constant.
But C99 also says,
6.4.4.4 10 Semantics
The value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined.
So the resulting value should be in the range of unsigned char([0, 255], if CHAR_BIT == 8). But as to which one, it depends on the compiler, architecture, etc.