how an integer and character is stored in c - c

#include<stdio.h>
void main()
{
int a=65;
char d='A';
if(a==d)
printf("both are same");
}
The output is both are same.here a is a integer so 65 is stored in 32 bits and d is a char which is stored in 8 bits how could they be same as is computer everything is converted to binary for any operation.

The computer is able to compare a char to an int on a binary level because of Implicit type promotion rules.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
This means your char is promoted to an int before your processor compares the two.

C is a very flawed language, so there are many dirty, irrational things going on between the lines here:
char has implementation-defined signedness, so how it stores data depends on compiler. Is char signed or unsigned by default?
'A' is a character literal, and as it happens, character literals are actually of type int in C. This doesn't make any sense, but that's just how it is.
In the line char d='A';, the literal 'A' (type int) gets converted to char. Which may or may not be signed. Signedness shouldn't in practice affect the basic character set A to Z though.
Most likely 'A' will be stored as the value 65, although this is not guaranteed by the standard. For that reason it is better to always write 'A' and never 65 (the former is also most readable).
In the expression a==d, the character operand is a small integer type. Small integer types undergo an implicit promotion to int when used in most expressions. This integer promotion is part of a set of rules for how expressions are balanced, to ensure that both operands of an operator are always of the same type. These rules are called the usual arithmetic conversions. For details see: Implicit type promotion rules

The internal storage is the compiler's decision, and often depends on the target architecture.
However, this has nothing to do with the result your code shows; in the comparison, the char gets promoted to an int before comparing (because you can't compare apples with oranges; read the language rules). Therefore, it compares an int with an int, and they are equal.

Related

Operations with Characters and Integers in C

I am new in programming Language and I need your help here.
I am here studying someone's code and I can across with these expressions, my doubt is how is the operation done here, given that a character and an integers are two different data types?
How will the integer type hold the character value?
Thanks
int line, col;
char ch;
scanf("%d%c", &line, &ch);
//line--;
col = ch - 'A';
my doubt is how is the operation done here, given that a character and an integers are two different data types?
I'm unsure how well this question will be received here, given it being about some fairly basic behavior of the language, but I commend you for thinking about type and type matching. Keep doing that!
The first thing to understand is that in C, char and its signed and unsigned variants are among the integer data types, so there is no mismatch of type category, just a question of possibly-different range and signedness. Characters are represented by integer codes (as, indeed, is pretty much everything in the computer's memory).
The second thing to understand is that C supports all manner of arithmetic operations on operands of mixed types. It defines a set of "usual arithmetic conversions" that are used to choose a common type for the operands and the result of each arithmetic operation. The operands are automatically converted to that type. I won't cover all the details here, but basically, floating-point types win over integer types, and wider types win over narrower types.
The third thing to understand is that C does not in any case directly define arithmetic on integer types narrower than (technically, having integer conversion rank less than that of) int. When a narrower value appears in an arithmetic expression, it is automatically converted to int (if int can represent all values of the original type) or to unsigned int. These automatic conversions are called the "integer promotions", and they are a subset of the usual arithmetic conversions.
A fourth thing that is sometimes important to know is that in C "integer character constants" such as 'A' have type int, not type char (C++ differs here).
So, to evaluate this ...
col = ch - 'A';
... the usual arithmetic conversions are first applied to ch and 'A'. This involves performing the integer promotions on the value of ch, resulting in the same numeric value, but as an int. The constant 'A' already has type int, so these now match, and their difference can be computed without any further conversions. The result is an int, which is the same type as col, so no conversion is required to assign the result, either.
How will the integer type hold the character value?
Character values are integer values. Type int can accommodate all values that type char can accommodate.* Nothing special is happening in that regard.
*Technically, int can accommodate all values that can be represented by signed char, unsigned int can accommodate all values that can be represented by type unsigned char, and at least one of the two can accommodate all values that can be represented by (default) char. You are fairly unlikely to run across a C implementation where there are char values that int cannot accommodate, and the above assumes that you are not working with such an implementation, but these are allowed and some may exist.
At the fundamental level, every type in C (be it char, int, uint32_t, short, long...) is represented by bytes, and is 'numerical' in form. You can subtract them from each other / add them together in whichever combination you like - as long as you store the resulting value in a variable of a type which is big enough to hold it - otherwise it will cause a buffer overflow.
In your example, since a char type is represented by a single byte, and an int is composed of 8, the result of this subtraction will simply be stored in the right-most byte of an int (however, depending on if you're dealing with an expression which will yield a negative value, then the representation of the int in memory will be slightly different - look into 2's complement if you're interested).
When you subtract two characters and put them in a variable of integer type, in fact the ASCII code of the two characters is subtracted.
For example when you have:
int col = 'D' - 'A';
The value of col is equal to 3
Because ascii code of D is equal to 68 and ascii code of A is 65. So col is 3, however D & A were character.
Also you can see here

Value of character constants in C

6.4.4.4/10 ...If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
I'm having trouble understanding this paragraph. After this paragraph standard gives the example below:
Example 2: Consider implementations that use two’s complement representation for
integers and eight bits for objects that have type char. In an
implementation in which type char has the same range of values as
signed char, the integer character constant '\xFF' has the value −1;
if type char has the same range of values as unsigned char, the
character constant '\xFF' has the value +255.
What i understand from the expression: "value of an object with type char" is the value we get when we interpret the object's content with type char. But when we look to the example it's like talking about the object's value with pure binary notation. Is my understanding wrong? Does an object's value mean the bits in that object always?
All "integer character constants" (the stuff between ' and ') have type int out of tradition and compatibility reasons. But they are mostly meant to be used together with char, so 6.4.4.4/10 needs to make a distinction between the types. Basically patch up the broken C language - we have cases such as *"\xFF" that results in type char but '\xFF' results in type int, which is very confusing.
The value '\xFF' = 255 will always fit in an int on any implementation, but not necessarily in a char, which has implementation-defined signedness (another inconsistency in the language). The behavior of the escape sequence should be as if we stored the character constant in a char, as done in my string literal example *"\xFF".
This need for consistency with char type even though the value is stored in an int is what 6.4.4.4/10 describes. That is, printf("%d", '\xFF'); should behave just as char ch = 255; printf("%d", (int)ch);
The example is describing one possible implementation, where char is either signed or unsigned and the system uses 2's complement. Generally the value of an object with integer type refers to decimal notation. char is an integer type, so it can have a negative decimal value (if the symbol table has a matching index for the value -1 or not is another story). But "raw binary" cannot have a negative value, 1111 1111 can only be said to be -1 if you say the the memory cell should be interpreted as 8 bit 2's complement. That is, if you know that a signed char is stored there. If you know that an unsigned char is stored there, then the value is 255.

Why can we assign integers to a char variable

I randomly surfed on StackOverflow. As I saw a question I became clueless. Why can we assign Integer values to a char variable?
Code snippet:
#include <stdio.h>
int main()
{
char c = 130;
unsigned char f = 130;
printf("c = %d\nf = %d\n",c,f);
return 0;
}
Output:
c = -126
f = 130
I always thought values have to be assigned to the right type indentifier, why can we do that?
That's because char is an integer type (the smallest one) and values of different integer types can be implicitly converted. But beware that your example code has implementation defined behavior on a typical machine with signed 8bit char *): 130 overflows (the maximum value would be 127) and the result of overflowing a signed integer type during conversion is implementation defined.
You might have asked this question because you thought char is for storing characters. This is actuall true, but characters are numbers. See Character Encoding for more details.
*) whether char (without explicit signed or unsigned) is signed is implementation-defined, as is the number of bits, but there must be at least 8.
Quoting C11, chapter §6.5.16.1p2
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
This implies that the RHS in an assignment operator is implicitly converted to the type of the variable on the LHS. In your case, the integer constant is converted to char type.
Also, there is no char constant in C. The character constants like 'a', 'B' are all of int type.

printf format for 1 byte signed number

Assuming the following:
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?
what is the correct format for a 1 byte signed number?
%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:
hh
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.
However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.
In case that wasn't clear, a simple example (but controversial) example:
/* Read the comments thread to this post; I'll remove
this note when I edit the outcome of the discussion into
the answer
*/
#include <stdio.h>
int main(void) {
char* s = "\u00d1"; /* Ñ */
for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
return 0;
}
Output:
$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)
In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)
So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.
Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).
Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)
So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.
For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:
the int argument is converted to an unsigned char, and the resulting character is written.
If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:
void f(char c) {
printf("This is a '%c'.\n", c);
}
was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.
The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.
I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".
I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)

Inconsistent behaviour of implicit conversion between unsigned and bigger signed types

Consider following example:
#include <stdio.h>
int main(void)
{
unsigned char a = 15; /* one byte */
unsigned short b = 15; /* two bytes */
unsigned int c = 15; /* four bytes */
long x = -a; /* eight bytes */
printf("%ld\n", x);
x = -b;
printf("%ld\n", x);
x = -c;
printf("%ld\n", x);
return 0;
}
To compile I am using GCC 4.4.7 (and it gave me no warnings):
gcc -g -std=c99 -pedantic-errors -Wall -W check.c
My result is:
-15
-15
4294967281
The question is why both unsigned char and unsigned short values are "propagated" correctly to (signed) long, while unsigned int is not ? Is there any reference or rule on this ?
Here are results from gdb (words are in little-endian order) accordingly:
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 11111111111111111111111111111111
(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001 00000000000000000000000000000000
This is due to how the integer promotions applied to the operand and the requirement that the result of unary minus have the same type. This is covered in section 6.5.3.3 Unary arithmetic operators and says (emphasis mine going forward):
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
and integer promotion which is covered in the draft c99 standard section 6.3 Conversions and says:
if an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.
In the first two cases, the promotion will be to int and the result will be int. In the case of unsigned int no promotion is required but the result will require a conversion back to unsigned int.
The -15 is converted to unsigned int using the rules set out in section 6.3.1.3 Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)
So we end up with -15 + (UMAX + 1) which results in UMAX - 14 which results in a large unsigned value. This is sometimes why you will see code use -1 converted to to an unsigned value to obtain the max unsigned value of a type since it will always end up being -1 + UMAX + 1 which is UMAX.
int is special. Everything smaller than int gets promoted to int in arithmetic operations.
Thus -a and -b are applications of unary minus to int values of 15, which just work and produce -15. This value is then converted to long.
-c is different. c is not promoted to an int as it is not smaller than int. The result of unary minus applied to an unsigned int value of k is again an unsigned int, computed as 2N-k (N is the number of bits).
Now this unsigned int value is converted to long normally.
This behavior is correct. Quotes are from C 9899:TC2.
6.5.3.3/3:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
6.2.5/9:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
6.3.1.1/2:
The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
So for long x = -a;, since the operand a, an unsigned char, has conversion rank less than the rank of int and unsigned int, and all unsigned char values can be represented as int (on your platform), we first promote to type int. The negative of that is simple: the int with value -15.
Same logic for unsigned short (on your platform).
The unsigned int c is not changed by promotion. So the value of -c is calculated using modular arithmetic, giving the result UINT_MAX-14.
C's integer promotion rules are what they are because standards-writers wanted to allow a wide variety of existing implementations that did different things, in some cases because they were created before there were "standards", to keep on doing what they were doing, while defining rules for new implementations that were more specific than "do whatever you feel like". Unfortunately, the rules as written make it extremely difficult to write code which doesn't depend upon a compiler's integer size. Even if future processors would be able to perform 64-bit operations faster than 32-bit ones, the rules dictated by the standards would cause a lot of code to break if int ever grew beyond 32 bits.
It would probably in retrospect have been better to have handled "weird" compilers by explicitly recognizing the existence of multiple dialects of C, and recommending that compilers implement a dialect that handles various things in consistent ways, but providing that they may also implement dialects which do them differently. Such an approach may end up ultimately being the only way that int can grow beyond 32 bits, but I've not heard of anyone even considering such a thing.
I think the root of the problem with unsigned integer types stems from the fact that they are sometimes used to represent numerical quantities, and are sometimes used to represent members of a wrapping abstract algebraic ring. Unsigned types behave in a manner consistent with an abstract algebraic ring in circumstances which do not involve type promotion. Applying a unary minus to a member of a ring should (and does) yield a member of that same ring which, when added to the original, will yield zero [i.e. the additive inverse]. There is exactly one way to map integer quantities to ring elements, but multiple ways exist to map ring elements back to integer quantities. Thus, adding a ring element to an integer quantity should yield an element of the same ring regardless of the size of the integer, and conversion from rings to integer quantities should require that code specify how the conversion should be performed. Unfortunately, C implicitly converts rings to integers in cases where either the size of the ring is smaller than the default integer type, or when an operation uses a ring member with an integer of a larger type.
The proper solution to solve this problem would be to allow code to specify that certain variables, return values, etc. should be regarded as ring types rather than numbers; an expression like -(ring16_t)2 should yield 65534 regardless of the size of int, rather than yielding 65534 on systems where int is 16 bits, and -2 on systems where it's larger. Likewise, (ring32)0xC0000001 * (ring32)0xC0000001 should yield (ring32)0x80000001 even if int happens to be 64 bits [note that if int is 64 bits, the compiler could legally do anything it likes if code tries to multiply two unsigned 32-bit values which equal 0xC0000001, since the result would be too large to represent in a 64-bit signed integer.
Negatives are tricky. Especially when it comes to unsigned values. If you look at the c-documentation, you'll notice that (contrary to what you'd expect) unsigned chars and shorts are promoted to signed ints for computing, while an unsigned int will be computed as an unsigned int.
When you compute the -c, the c is treated as an int, it becomes -15, then is stored in x, (which still believes it is an UNSIGNED int) and is stored as such.
For clarification - No ACTUAL promotion is done when "negativeing" an unsigned. When you assign a negative to any type of int (or take a negative) the 2's compliment of the number is instead used. Since the only practical difference between unsigned and signed values is that the MSB acts as a sign flag, it is taken as a very large positive number instead of a negative one.

Resources