Please explain this result please. printf("%c", 'abcd') - c

#include <stdio.h>
int main()
{
printf("%c\n", 'abcd');
printf("%p\n", 'abcd');
printf("%c\n", 0x61626364);
printf("%c\n", 0x61626363);
printf("%c\n", 0x61626365);
return 0;
}
I want to ask this line : printf("%c\n", 'abcd');
In this line, the result is 'd' but, I can't understand why 'd' is come out.
I tried to look other memories. In this situation, I found other memories have all alphabets.
Please explain me why result is 'd' and why other memories have all alphabets.
Thank you.

'abcd' is a multi-character constant, its value is implementation-defined.
C11 §6.4.4.4 Character constants section 10
An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined. If an integer character constant contains
a single character or escape sequence, its value is the one that results when an object with
type char whose value is that of the single character or escape sequence is converted to
type int.
A common implementation gives 'abcd' a value of 'a' * 256 * 256 * 256 + 'b' * 256 * 256 + 'c' * 256 + 'd' (1633837924), you can check its value in your implementation by printing it using "%d". Although legal C, it's rarely used in practice.

Your code is wrong. When you compile it with a recent GCC compiler enabling warnings with
gcc -Wall -Wextra u.c
you get
u.c: In function 'main':
u.c:5:20: warning: multi-character character constant [-Wmultichar]
printf("%c\n", 'abcd');
^
u.c:6:20: warning: multi-character character constant [-Wmultichar]
printf("%p\n", 'abcd');
^
u.c:6:5: warning: format '%p' expects argument of type 'void *', but argument 2 has type 'int' [-Wformat=]
printf("%p\n", 'abcd');
^
Technically, you are in the awful undefined behavior case (and unspecified behavior for the multi-character constants), and anything could happen with a standard compliant implementation.
I never saw any useful case for multi-character constants like 'abcd'. I believe they are useless and mostly are an historical artefact.
To explain what really happens, it is implementation specific (depends upon the compiler, the processor, the optimization flags, the ABI, the runtime environment, ....) and you need to dive into gory details (first look at the generated assembler code with gcc -fverbose-asm -S) and into your libc particular printf implementation.
As a rule of thumb, you should improve your code to get rid of every warnings your compiler is able to give you (your compiler is helpful in warning you). They are few subtle exceptions (but then you should comment your code about them).

printf("%c\n", 'abcd');
As noted already, the value of 'abcd' is implementation-defined. On your implementation, its value is 0x61626364, so it behaves the same as your third printf call. See below.
printf("%p\n", 'abcd');
As noted already, %p is used to print pointers. 'abcd' is not a pointer, so this call is simply invalid.
printf("%c\n", 0x61626364);
printf("%c\n", 0x61626363);
printf("%c\n", 0x61626365);
The specification for %c reads:
If no l length modifier is present, the int argument is converted to an unsigned char, and the resulting character is written.
Conversions of int to unsigned char are well-defined and reduce the value modulo UCHAR_MAX+1. On most implementations, this means it takes the lowest 8 bits of the number.
The lowest 8 bits of 0x61626364, 0x61626363 and 0x61626365 are 0x64, 0x63 and 0x65, which in ASCII correspond to 'd', 'c' and 'e', so ASCII implementations will print those characters.

Your code
printf("%c\n", 'abcd');
results in output
d
due to the "%c" specifying a single character. Because multiple characters were provided instead of a single character, the multi-character constant was 'converted' to a single character by taking the last character of the string.
The result of providing a string where a single character is expected is the "implementation-defined" behavior. This means different compilers can handle this differently. See stackoverflow.com/multiple-characters-in-a-character-constant.

Related

Why char in C is performing different behaviour?

I am a beginner and I am curious about char in C. I know that char takes only one character. If I inline initialize char like the below code:
#include <stdio.h>
int main() {
char c2 = 'ab';
printf("c2: %c\n", c2);
return 0;
}
It will output b. My first question is why b and why not a?
Again if I take input form console like the below code:
#include <stdio.h>
int main() {
char c3;
scanf("%c", &c3);
printf("c3: %c\n", c3);
return 0;
}
Now if I input ab then it will output a. Why now not giving output b as the first one (inline initialization)?
Please someone explain why this different behaviour is performing?
You can't fit 'ab' in a char, and trying to assign 'ab' to a char truncates it to the lower byte, 'b'.
Please, please turn your compiler warnings on and heed them; you can see these errors on Godbolt:
<source>: In function 'main':
<source>:4:15: warning: multi-character character constant [-Wmultichar]
4 | char c2 = 'ab';
| ^~~~
<source>:4:15: warning: overflow in conversion from 'int' to 'char' changes value from '24930' to '98' [-Woverflow]
98 is the ASCII value for b:
>>> chr(98)
'b'
For your second program, if you input 'ab', scanf with %c consumes 1 character, 'a'. (Other input remains in the input buffer and would be consumed by subsequent similar scanf calls.)
Compiling the first example may compile with warnings. e.g in gcc:
main.c: In function ‘main’:
main.c:4:15: warning: multi-character character constant [-Wmultichar]
4 | char c2 = 'ab';
| ^~~~
main.c:4:15: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘24930’ to ‘98’ [-Woverflow]
Since:
The value of an integer character constant containing more than one character (e.g., 'ab'), [...] is implementation-defined.
The warnings also suggests the implementation behaviour: 24930 = 0x6162 where 0x62 is ASCII 'b'. So the implementation behaviour is a matter of byte-order, then assignment of an int to a char, which assigns the least significant byte.
The behaviour of the second example is no surprise - input is not assignment or initialisation, and it is not a C language behaviour, but a system I/O behaviour. Input is buffered and requesting a single character input will take the first character from the first-in-first-out (FIFO) input queue. The first character when you enter 'a' followed by 'b' is of course 'a'.
A second input request would retrieve the 'b' whereas in the initialisation example the 'b' is simply discarded by the assignment. Comparing I/O behaviour with C language assignment behaviour are not at all comparable. The input is a FIFO queue of single characters, whereas 'ab' is an int (in this implementation).
This is compiler-specific, so called implementation-defined behavior.
From the C standard 6.4.4.4/2:
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.
Then 6.4.4.4/10, which happens to use the very same example as you:
An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined. If an integer character constant contains
a single character or escape sequence, its value is the one that results when an object with
type char whose value is that of the single character or escape sequence is converted to
type int.
In the second example, the first character you place in stdin is the one read and the other is ignored. This has nothing to do with character constants but is simply how scanf works.

Using octal character gives warning: multi-character character constant

Following the logic from this question, the following code should work:
#include <stdio.h>
int main(){
printf("%c", '\0101');
return 0;
}
However, it gives the following error:
main.c: In function 'main':
main.c:5:18: warning: multi-character character constant [-Wmultichar]
printf("%c", '\0101');
^~~~~~~
exit status -1
I am not sure why it is a multi-character constant. I believe there should only be a single character constant inside those single quotes (octal 101 = decimal 65 = 'A'). Why are there more than one characters? And why isn't octal notation not working?
The octal char notation ought to be of the form \abc where a, b, and c, are octal digits (i.e. in the inclusive range of 0 to 7).
Your has four digits, so the compiler will interpret it as \010 (maximal munch) followed by 1.
That's a multicharacter constant, rather like '12'. Like \abc, that has an int type but the value is implementation defined, and the conversion to c in printf will have implementation-defined behaviour. Your helpful compiler is alerting you of that and, not surprisingly, is using correct terminology.
Did you mean to write '\101'? If you did, and what you really wanted was the upper case letter A, then write 'A' for portable C.
It should be '\101' not '\0101'. You can use '\x41'(Hexadecimal) or '\101'(octal) instead of 'A'. But both reduces the portability and readability of your code. You should only consider using escape sequences when there isn't a better way to represent the character.

unable to get char of 2 character long in C [duplicate]

This question already has answers here:
How to determine the result of assigning multi-character char constant to a char variable?
(5 answers)
Closed 8 years ago.
following statement in c gives no error
char p='-1';
but the following gives error:
char p='-12';
ERROR: character can be one or two characters long.
I never knew that a char in c can ever be two characters long. However printf("%c",p) gives - as output. Where can i use char in c?
In C, a character constant like 'A' does not have type char, but rather type int. This creates the possibility that, even on a system where char is only 8 bits wide (and so int is wider than char), character constant notations can exist which provide integer values wider than char.
The C standard requires implementations to support multi-character constants, but their values are implementation-defined.
Why your compiler allows only two characters is likely because the type int is only 16 bits wide. Perhaps a constant like 'AB' is encoded similarly to, say, the expression ('A' << 8 | 'B'). According to the obvious extension of this scheme, 'ABC' would then have to be ('A' << 16 | 'B' << 8 | 'C') which doesn't fit into 16 bits and calls for out-of-range shifts. Hence, the two character limit.
In the GNU C compiler, four characters can be used:
#include <stdio.h>
int main(void)
{
printf("%x\n", (unsigned) 'ABCD');
return 0;
}
int is 32 bits wide, and this program prints 41424344 which, by golly, is hexadecimal for the ASCII characters ABCD. So this feature is useful for int-wide magic constants which are readable. Instead of:
#define MAGIC 0x41424344 /* This spells ABCD; easy to spot in memory dumps */
You can do this, which is nice, but less portable:
#define MAGIC 'ABCD'
What if we use five or more characters, like 'ABCDE'? Then GCC respond similarly to how Turbo C++ responds for three or more:
test.c:5:35: warning: character constant too long for its type [enabled by default]
It so happens that the program still compiles, and its output is unchanged: the E was truncated.
There is an important difference. The old Borland compiler is rejecting the excessively-long constant as an error. Though that is probably a good idea, it is not standard-conforming; when some value is implementation-defined, the implementation's response cannot be failure, such as stopping the translation or execution of the program. Issuing a diagnostic is fine, of course.
char p='-517';
printf("%c\n", p);
Running the above code gave me output 7 and a warning: overflow in implicit constant conversion [-Woverflow]
char can not contain more than 1 byte of information
You want an array of characters, also known as a C-string
// Note, if you initialize a character array with a literal string
// there is no need for a size specifier
char c[] = "-12";
// Note this is a method of copying one character array into another.
#include <string.h>
char c[4];
strcpy(c, "-12");
You'll notice that char c[4] has an indicated size of 4. Meaning, the array can only hold 4 characters. In C, character arrays have a special property: A null terminator (char '\0') is a sentinel value that C-string functions use to recognize the end of your string. So, in reality, a character string "-12" is of size 4. '-', '1', '2', and '\0'.
You can also access individual elements of an array by passing an indice to it's operator[] function.
printf("%s\n", c);
printf("%c\n", c[0]);
Notice the c[0] call, This will access the character '-' of the string "-12".
Hope I helped.

Why are 4 characters allowed in a char variable? [duplicate]

This question already has answers here:
How to determine the result of assigning multi-character char constant to a char variable?
(5 answers)
Closed 9 years ago.
I have the following code in my program:
char ch='abcd';
printf("%c",ch);
The output is d.
I fail to understand why is a char variable allowed to take in 4 characters in its declaration without giving a compile time error.
Note: More than 4 characters is giving an error.
'abcd' is called a multicharacter constant, and will has an implementation-defined value, here your compiler gives you 'd'.
If you use gcc and compile your code with -Wmultichar or -Wall, gcc will warn you about this.
I fail to understand why is a char variable allowed to take in 4
characters in its declaration without giving a compile time error.
It's not packing 4 characters into one char. The multi-character const 'abcd' is of type int and then the compiler does constant conversion to convert it to char (which overflows in this case).
Assuming you know that you are using multi-char constant, and what it is.
I don't use VS these days, but my take on it is, that 4-char multi-char is packed into an int, then down-casted to a char. That is why it is allowed. Since the packing order of multi-char constant into an integer type is compiler-defined it can behave like you observe it.
Because multi-character constants are meant to be used to fill integer typed, you could try 8-byte long multi-char. I am not sure whether VS compiler supports it, but there is a good chance it is, because that would fit into a 64-bit long type.
It probably should give a warning about trying to fit a literal value too big for the type. It's kind of like unsigned char leet = 1337;. I am not sure, however, how does this work in VS (whether it fires a warning or an error).
4 characters are not being put into a char variable, but into an int character constant which is then assigned to a char.
3 parts of the C standard (C11dr §6.4.4.4) may help:
"An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'."
"An integer character constant has type int."
"The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."
OP's code of char ch='abcd'; is the the assignment of an int to a char as 'abcd' is an int. Just like char ch='Z';, ch is assigned the int value of 'Z'. In this case, there is no surprise, as the value of 'Z' fits nicely in a char. In the 'abcd', case, the value does not fit in a char and so some information is lost. Various outcomes are possible. Typically on one endian platform, ch will have a value of 'a' and on another, the value of 'd'.
The 'abcd' is an int value, much like 12345 in int x = 12345;.
When the size(int) == 4, an int may be assigned a character constant such as 'abcd'.
When size(int) != 4, the limit changes. So with an 8-char int, int x = 'abcdefgh'; is possible. etc.
Given that an int is only guaranteed to have a minimum range -32767 to 32767, anything beyond 2 is non-portable.
The int endian-ness of even int = 'ab'; presents concerns.
Character constant like 'abcd' are typically used incorrectly and thus many compilers have a warning that is good to enable to flag this uncommon C construct.

why '\97' ascii value equals 55

Just like C code:
#include<stdio.h>
int main(void) {
char c = '\97';
printf("%d",c);
return 0;
}
the result is 55,but I can't understand how to calculate it.
I know the Octal number or hex number follow the '\', does the 97 is hex number?
\ is a octal escape sequence but 9 is not a valid octal digit so instead of interpreting it as octal it is being interpreted as a multi-character constant a \9 and a 1 whose value is implementation defined. Without any warning flags gcc provides the following warnings by default:
warning: unknown escape sequence: '\9' [enabled by default]
warning: multi-character character constant [-Wmultichar]
warning: overflow in implicit constant conversion [-Woverflow]
The C99 draft standard in section 6.4.4.4 Character constants paragraph 10 says (emphasis mine):
An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined.
For example gcc implementation is documented here and is as follows:
The compiler evaluates a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not (a slight change from versions 3.1 and earlier of GCC). If there are more characters in the constant than would fit in the target int the compiler issues a warning, and the excess leading characters are ignored.
For example, 'ab' for a target with an 8-bit char would be interpreted as ‘(int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')’, and '\234a' as ‘(int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')’.
As far as I can tell this is being interpreted as:
char c = ((unsigned char)'\71')*256 + '7' ;
which results in 55, which is consistent with the multi-character constant implementation above although the translation of \9 to \71 is not obvious.
Edit
I realized later on what is really happening is the \ is being dropped and so \9 -> 9, so what we really have is:
c = ((unsigned char)'9')*256 + '7' ;
which seems more reasonable but still arbitrary and not clear to me why this is not a straight out error.
Update
From reading The Annotated C++ Reference Manual we find out that in Classic C and older versions of C++ when backslash followed character was not defined as an scape sequence it was equal to the numeric value of the character. ARM section 2.5.2:
This differs from the interpretation by Classic C and early versions of C++, where the value of a sequence of a blackslash followed by a character in the source character set, if not defined as an escape sequence, was equal to the numeric value of the character. For example '\q' would be equal to 'q'.
\9 is not a valid escape, so the compiler ignores it and ascii '7' is 55.
I would not depend on this behavior, it's probably undefined. But that's where the 55 came from.
edit: Shafik points out it's not undefined, it's implementation defined. See his answer for the references.
First of all, I'm going to assume your code should read this, because it matches your title.
#include<stdio.h>
int main(void) {
char c = '\97';
printf("%d",c);
return 0;
}
\9 isn't valid, thus let's just assume the character is actually 7. 7 is ascii 55, which is the answer that was printed out.
I'm not sure what you wanted, but \97 isn't it...
\9 isn't a valid escape sequence, so it's likely falling back to a plain 9 character.
This means that it's the same thing as '97', which is undefined implementation defined (see Shafik Yaghmour's answer) behavior (2 characters can't fit into 1 character...).
To avoid things like this in the future, consider cranking up the warnings on your compiler. For example, a minimum for gcc should be -Wall -Wextra -pedantic.

Resources