Why the answer of "printf("%d", '0/');" is 12335? [duplicate] - c

This question already has answers here:
Multiple characters in a character constant
(3 answers)
Closed 8 years ago.
This is the c code:
int main(int argc, const char *argv[])
{
printf("%d", '0/');
return 0;
}
The output is 12335! Then I try to replace '0/' with '00' and '000', and the outputs change to 12336 and 3158064, while 12336=48*(1+2^8), 3158064=48*(1+2^8+2^16). However, I still don't know why. What happens when '0/' is transformed to an integer for output?
PS: My computer is MBP, and the operating system is OS X 10.9.5 (13F34). The compiler is Apple LLVM 6.0.

You have constructed a "multi-character literal". The behaviour is implementation-defined, but in your case, the integer value is constructed from the ASCII values (12235 == 48 * 256 + 47).

'0/' is a multi-character constant, which means it has an implementation-defined value. In your case, the ASCII value of the characters is 0x30 0x2f. These are combined into 0x302f, which equals 12335.

Because 0/ is a multi-character constant of type int. You initialize the first part of a 2x3 buffer with it, and pass it to printf to be re-interpreted as an int, the first byte '0' is multiplied by 256 and then the second byte '/' is added to it. This produces the value that you see:
printf("%d %d %d", '0', '/', '0'*256+'/');
prints
48 47 12335
demo.
Note that this behavior is system-dependent. On other systems you could see 12080 instead of 12335.
See this answer for more information on multicharacter constants.

Related

Using int to print character constants [duplicate]

This question already has answers here:
Multi-character constant warnings
(6 answers)
Print decimal value of a char
(5 answers)
Closed 5 years ago.
I wrote the following program,
#include<stdio.h>
int main(void)
{
int i='A';
printf("i=%c",i);
return 0;
}
and I got the result as,
i=A
So I tried another program,
#include<stdio.h>
int main(void)
{
int i='ABC';
printf("i=%c",i);
return 0;
}
According to me, since 32 bits are used to store an int value and each of 'A', 'B' and 'C' have 8 bit ASCII codes which totals to 24 bits therefore 24 bits were stored in a 32 bit unit. So I expected the output to be,
i=ABC
but the output instead was
i=C
and I can't understand why?
'ABC' in this case is a integer character constant as per section 6.4.4.4.10 of the standard.
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g.,'ab'), or containing a character or escape sequence
that does not map to a single-byteexecution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
In this case, 'A'==0x41, 'B'==0x42, 'C'==0x43, and your compiler then interprets i to be 0x414243. As said in the other answer, this value is implementation dependent.
When you try to access it using '%c', the overflown part will be cut and you are only left with 0x43, which is 'C'.
To get more insight to it, read the answers to this question as well.
The conversion specifier c used in this call
printf("i=%c",i);
in fact extracts one character from the integer argument. So using this specifier you in any case can not get three characters as the output.
From the C Standard (7.21.6.1 The fprintf function)
c If no l length modifier is present, the int argument is converted to
an unsigned char, and the resulting character is written
Take into account that the internal representation of a multi-byte character constant is implementation defined. From the C Standard (6.4.4.4 Character constants)
...The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character, is
implementation-defined.
'ABC' is an integer character constant. Depending on code set (overwhelming it is ASCII), endian, int width (apparently 32 bits in OP's case), it may have the same value like below. It is implementation defined behavior.
'ABC'
0x41424300
0x434241
or others.
The "%c" directs printf() to take the int value, cast it to unsigned char and print the associated character. This is the main reason for apparent loss of information.
In OP's case, it appears that i took on the value of 0x434241.
int i='A';
printf("i=%c",i); --> 'A'
// same as
printf("i=%c",0x434241); --> 'A'
if you want i to contain 3 characters you need to init a array that contains 3 characters
char i[3];
i[0]= 'A';
i[1]= 'B';
i[2]='C';
the ' ' can contain only one char your code converts the integer i into a character or better you store in your 32 bit intiger a converted 8 bit character. But i think You want to seperate the 32 bits into 8 bit containers make a char array like char i[3]. and then you will see that
int j=i;
this will result in an error because you are unable to convert a char array into a integer.
In C, 'A' is an int constant that's guaranteed to fit into a char.
'ABC' is a multicharacter constant. It has an int type, but an implementation defined value. The behaviour on using %c to print that in printf is possibly undefined if the value cannot fit into a char.

unable to get char of 2 character long in C [duplicate]

This question already has answers here:
How to determine the result of assigning multi-character char constant to a char variable?
(5 answers)
Closed 8 years ago.
following statement in c gives no error
char p='-1';
but the following gives error:
char p='-12';
ERROR: character can be one or two characters long.
I never knew that a char in c can ever be two characters long. However printf("%c",p) gives - as output. Where can i use char in c?
In C, a character constant like 'A' does not have type char, but rather type int. This creates the possibility that, even on a system where char is only 8 bits wide (and so int is wider than char), character constant notations can exist which provide integer values wider than char.
The C standard requires implementations to support multi-character constants, but their values are implementation-defined.
Why your compiler allows only two characters is likely because the type int is only 16 bits wide. Perhaps a constant like 'AB' is encoded similarly to, say, the expression ('A' << 8 | 'B'). According to the obvious extension of this scheme, 'ABC' would then have to be ('A' << 16 | 'B' << 8 | 'C') which doesn't fit into 16 bits and calls for out-of-range shifts. Hence, the two character limit.
In the GNU C compiler, four characters can be used:
#include <stdio.h>
int main(void)
{
printf("%x\n", (unsigned) 'ABCD');
return 0;
}
int is 32 bits wide, and this program prints 41424344 which, by golly, is hexadecimal for the ASCII characters ABCD. So this feature is useful for int-wide magic constants which are readable. Instead of:
#define MAGIC 0x41424344 /* This spells ABCD; easy to spot in memory dumps */
You can do this, which is nice, but less portable:
#define MAGIC 'ABCD'
What if we use five or more characters, like 'ABCDE'? Then GCC respond similarly to how Turbo C++ responds for three or more:
test.c:5:35: warning: character constant too long for its type [enabled by default]
It so happens that the program still compiles, and its output is unchanged: the E was truncated.
There is an important difference. The old Borland compiler is rejecting the excessively-long constant as an error. Though that is probably a good idea, it is not standard-conforming; when some value is implementation-defined, the implementation's response cannot be failure, such as stopping the translation or execution of the program. Issuing a diagnostic is fine, of course.
char p='-517';
printf("%c\n", p);
Running the above code gave me output 7 and a warning: overflow in implicit constant conversion [-Woverflow]
char can not contain more than 1 byte of information
You want an array of characters, also known as a C-string
// Note, if you initialize a character array with a literal string
// there is no need for a size specifier
char c[] = "-12";
// Note this is a method of copying one character array into another.
#include <string.h>
char c[4];
strcpy(c, "-12");
You'll notice that char c[4] has an indicated size of 4. Meaning, the array can only hold 4 characters. In C, character arrays have a special property: A null terminator (char '\0') is a sentinel value that C-string functions use to recognize the end of your string. So, in reality, a character string "-12" is of size 4. '-', '1', '2', and '\0'.
You can also access individual elements of an array by passing an indice to it's operator[] function.
printf("%s\n", c);
printf("%c\n", c[0]);
Notice the c[0] call, This will access the character '-' of the string "-12".
Hope I helped.

why '\97' ascii value equals 55

Just like C code:
#include<stdio.h>
int main(void) {
char c = '\97';
printf("%d",c);
return 0;
}
the result is 55,but I can't understand how to calculate it.
I know the Octal number or hex number follow the '\', does the 97 is hex number?
\ is a octal escape sequence but 9 is not a valid octal digit so instead of interpreting it as octal it is being interpreted as a multi-character constant a \9 and a 1 whose value is implementation defined. Without any warning flags gcc provides the following warnings by default:
warning: unknown escape sequence: '\9' [enabled by default]
warning: multi-character character constant [-Wmultichar]
warning: overflow in implicit constant conversion [-Woverflow]
The C99 draft standard in section 6.4.4.4 Character constants paragraph 10 says (emphasis mine):
An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined.
For example gcc implementation is documented here and is as follows:
The compiler evaluates a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not (a slight change from versions 3.1 and earlier of GCC). If there are more characters in the constant than would fit in the target int the compiler issues a warning, and the excess leading characters are ignored.
For example, 'ab' for a target with an 8-bit char would be interpreted as ‘(int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')’, and '\234a' as ‘(int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')’.
As far as I can tell this is being interpreted as:
char c = ((unsigned char)'\71')*256 + '7' ;
which results in 55, which is consistent with the multi-character constant implementation above although the translation of \9 to \71 is not obvious.
Edit
I realized later on what is really happening is the \ is being dropped and so \9 -> 9, so what we really have is:
c = ((unsigned char)'9')*256 + '7' ;
which seems more reasonable but still arbitrary and not clear to me why this is not a straight out error.
Update
From reading The Annotated C++ Reference Manual we find out that in Classic C and older versions of C++ when backslash followed character was not defined as an scape sequence it was equal to the numeric value of the character. ARM section 2.5.2:
This differs from the interpretation by Classic C and early versions of C++, where the value of a sequence of a blackslash followed by a character in the source character set, if not defined as an escape sequence, was equal to the numeric value of the character. For example '\q' would be equal to 'q'.
\9 is not a valid escape, so the compiler ignores it and ascii '7' is 55.
I would not depend on this behavior, it's probably undefined. But that's where the 55 came from.
edit: Shafik points out it's not undefined, it's implementation defined. See his answer for the references.
First of all, I'm going to assume your code should read this, because it matches your title.
#include<stdio.h>
int main(void) {
char c = '\97';
printf("%d",c);
return 0;
}
\9 isn't valid, thus let's just assume the character is actually 7. 7 is ascii 55, which is the answer that was printed out.
I'm not sure what you wanted, but \97 isn't it...
\9 isn't a valid escape sequence, so it's likely falling back to a plain 9 character.
This means that it's the same thing as '97', which is undefined implementation defined (see Shafik Yaghmour's answer) behavior (2 characters can't fit into 1 character...).
To avoid things like this in the future, consider cranking up the warnings on your compiler. For example, a minimum for gcc should be -Wall -Wextra -pedantic.

Strange C behaviour [duplicate]

This question already has answers here:
Multi-character constant warnings
(6 answers)
Closed 9 years ago.
What is happening here?
#include <stdio.h>
int main (void)
{
int x = 'HELL';
printf("%d\n", x);
return 0;
}
Prints 1212501068
I expected a compiling error.
Explanations are welcome =)
1212501068 in hex is 0x48454c4c.
0x48 is the ASCII code for H.
0x45 is the ASCII code for E.
0x4c is the ASCII code for L.
0x4c is the ASCII code for L.
Note that this behaviour is implementation-defined and therefore not portable. A good compiler would issue a warning:
$ gcc test.c
test.c: In function 'main':
test.c:4:11: warning: multi-character character constant [-Wmultichar]
In C, single quotes are used to denote characters, which are represented in memory by numbers. When you place multiple characters in single quotes, the compiler combines them in a single value however it wants, as long as it documents the process.
Looking at your number, 1212501068 is 0x48454C4C. If you decompose this number into bytes, you get 48 or 'H', 45 or 'E' and twice 4C or 'L'
Others have explained what happened. As for the explanation, I quote from C99 draft standard (N1256):
6.4.4.4 Character constants
[...]
An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g.,'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
The emphasis on the relevant sentence is mine.
The output of 1212501068 as hex is: 0x48 0x45 0x4C 0x4C
Look it up in an ASCII table, and you'll see those are the code for HELL.
BTW: single-quotes around a multi-char value are not standardized.
The exact interpretation of single-quotes around multiple characters is Implementation-Defined. But it is very common that it either comes out as a Big-Endian or Little-Endian integer. (Technically, the implementation could interpret it any way it chooses, including a random value).
In otherwords, depending on the platform, I would not be surprised to see it come out as:0x4C 0x4C 0x45 0x48, or 1280066888
And over on this question, and also on this site you can see practical uses of this behavior.
Line:
int x = 'HELL';
save to memory hex values of 'HELL' and it is 0x48454c4c == 1212501068.
The value is just 'HELL' interpreted as an int (usually 4 bytes).
If you try this:
#include <stdio.h>
int main (void)
{
union {
int x;
char c[4];
} u;
int i;
u.x = 'HELL';
printf("%d\n", u.x);
for(i=0; i<4; i++) {
printf("'%c' %x\n", u.c[i], u.c[i]);
}
return 0;
}
You'll get:
1212501068
'L' 4c
'L' 4c
'E' 45
'H' 48

Why sizeof('c') is returning 4 instead of 1? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why are C character literals ints instead of chars?
http://ideone.com/lHYY8
int main(void)
{
printf("%d %d\n", sizeof('c'), sizeof(char));
return 0;
}
Why does sizeof('c') return 4 instead of 1?
Because in C character constants have the type int, not char. So sizeof('c') == sizeof(int). Refer to this C FAQ
Perhaps surprisingly, character constants in C are of type int, so
sizeof('a') is sizeof(int) (though this is another area where C++
differs).
One (possibly even more extreme) oddity that also somehow justifies this, is the fact that character literals are not limited to being single character.
Try this:
printf("%d\n", 'xy');
This is sometimes useful when dealing with e.g. binary file formats that use 32-bit "chunk" identifiers, such as PNG. You can do things like this:
const int chunk = read_chunk_from_file(...);
if(chunk == 'IHDR')
process_image_header(...);
There might be portability issues with code like this though, of course the above snippet assumes that read_chunk_from_file() magically does the right thing to transform the big-endian 32-bit value found in the PNG file into something that matches the value of the corresponding multi-character character literal.
The following is the famous line from the famous C book - The C programming Language by Kernighan & Ritchie with respect to a character written between single quotes.
A character written between single quotes represents an integer value equal to the numerical value of the character in the machine's character set.
So sizeof('a') is equivalent to sizeof(int)
And this question is a duplicate of why sizeof('a') is 4 in C?
cnicutar is completely right of course. I just wanted to add the reason for this. If you look at functions line fgetc, you'll notice that it also returns an int. It's because a char can represent any character from 0x00 to 0xFF, but an additional value is needed in order to represent EOF. So functions that return a character from input or a file often return an int, which can be compared with EOF, which is usually defined to be -1, but it can be anything that isn't a valid character.

Resources