assigning more than one character in char - c

Why this program gives output 'y'
#include <stdio.h>
int main(void) {
char ch='abcdefghijklmnopqrstuvwxy';
printf("%c",ch);
return 0;
}
Code at ideone

It's a multi-character literal.
An ordinary character literal that contains more than one c-char is a
multicharacter literal . A multicharacter literal has type int and
implementation-defined value.
Also from 6.4.4.4/10 in C11 specs
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
So the line char ch = 'abcdefghijklmnopqrstuvwxy' on your system (assuming 4 byte int) possibly compiles to:
char ch = 0x76777879; // SOME int value (may be different, but documented in the compiler documents)
ch will be assigned 'abcdef...y' which may be equivalent to (int)0x616263646566...79 in ascii encoding and overflows an integer. This is the reason why gcc generates the following warning:
multicharlit.c: In function ‘main’: multicharlit.c:4:13: warning:
character constant too long for its type [enabled by default]
multicharlit.c:4:5: warning: overflow in implicit constant conversion
[-Woverflow]
It appears on your system, least significant 8 bits are used to assign to ch. Because your character literal is constant, this most possibly happens at compile time: (For example following happens when I compile with gcc)
$ cat multicharlit.c
#include <stdio.h>
int main(void) {
char ch='abcdefghijklmnopqrstuvwxy';
printf("%c",ch);
return 0;
}
$ gcc -O2 -fdump-tree-optimized multicharlit.c
$ cat multicharlit.c.143t.optimized
;; Function main (main) (executed once)
main ()
{
<bb 2>:
__builtin_putchar (121);
return 0;
}
Also stealing some goodness from unwind's comment
Remember that the type of a single-quoted character constant is int,
but you're assigning it to a char, so it has to be truncated to a
single character.
Type of 'a' for example is int in C. (Not to be confused with 'a' in C++ which is a char. On the other hand type of 'ab' is int in both C and C++.)
Now when you assign this int type to a char type and value is more than that can be represented by a char, then some squeezing needs to be done to fit the result into less wider type char and the actual result is implementation-defined.

If you intended to print out abcdefghijklmnopqrstuvwxy, then you should have stored it into a string variable instead of a char one (char ch[50] = char abcdefghijklmnopqrstuvwxy;).
String variables can hold more than one character, where as a char variable is for holding one character.

Related

Why char in C is performing different behaviour?

I am a beginner and I am curious about char in C. I know that char takes only one character. If I inline initialize char like the below code:
#include <stdio.h>
int main() {
char c2 = 'ab';
printf("c2: %c\n", c2);
return 0;
}
It will output b. My first question is why b and why not a?
Again if I take input form console like the below code:
#include <stdio.h>
int main() {
char c3;
scanf("%c", &c3);
printf("c3: %c\n", c3);
return 0;
}
Now if I input ab then it will output a. Why now not giving output b as the first one (inline initialization)?
Please someone explain why this different behaviour is performing?
You can't fit 'ab' in a char, and trying to assign 'ab' to a char truncates it to the lower byte, 'b'.
Please, please turn your compiler warnings on and heed them; you can see these errors on Godbolt:
<source>: In function 'main':
<source>:4:15: warning: multi-character character constant [-Wmultichar]
4 | char c2 = 'ab';
| ^~~~
<source>:4:15: warning: overflow in conversion from 'int' to 'char' changes value from '24930' to '98' [-Woverflow]
98 is the ASCII value for b:
>>> chr(98)
'b'
For your second program, if you input 'ab', scanf with %c consumes 1 character, 'a'. (Other input remains in the input buffer and would be consumed by subsequent similar scanf calls.)
Compiling the first example may compile with warnings. e.g in gcc:
main.c: In function ‘main’:
main.c:4:15: warning: multi-character character constant [-Wmultichar]
4 | char c2 = 'ab';
| ^~~~
main.c:4:15: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘24930’ to ‘98’ [-Woverflow]
Since:
The value of an integer character constant containing more than one character (e.g., 'ab'), [...] is implementation-defined.
The warnings also suggests the implementation behaviour: 24930 = 0x6162 where 0x62 is ASCII 'b'. So the implementation behaviour is a matter of byte-order, then assignment of an int to a char, which assigns the least significant byte.
The behaviour of the second example is no surprise - input is not assignment or initialisation, and it is not a C language behaviour, but a system I/O behaviour. Input is buffered and requesting a single character input will take the first character from the first-in-first-out (FIFO) input queue. The first character when you enter 'a' followed by 'b' is of course 'a'.
A second input request would retrieve the 'b' whereas in the initialisation example the 'b' is simply discarded by the assignment. Comparing I/O behaviour with C language assignment behaviour are not at all comparable. The input is a FIFO queue of single characters, whereas 'ab' is an int (in this implementation).
This is compiler-specific, so called implementation-defined behavior.
From the C standard 6.4.4.4/2:
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.
Then 6.4.4.4/10, which happens to use the very same example as you:
An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined. If an integer character constant contains
a single character or escape sequence, its value is the one that results when an object with
type char whose value is that of the single character or escape sequence is converted to
type int.
In the second example, the first character you place in stdin is the one read and the other is ignored. This has nothing to do with character constants but is simply how scanf works.

Using int to print character constants [duplicate]

This question already has answers here:
Multi-character constant warnings
(6 answers)
Print decimal value of a char
(5 answers)
Closed 5 years ago.
I wrote the following program,
#include<stdio.h>
int main(void)
{
int i='A';
printf("i=%c",i);
return 0;
}
and I got the result as,
i=A
So I tried another program,
#include<stdio.h>
int main(void)
{
int i='ABC';
printf("i=%c",i);
return 0;
}
According to me, since 32 bits are used to store an int value and each of 'A', 'B' and 'C' have 8 bit ASCII codes which totals to 24 bits therefore 24 bits were stored in a 32 bit unit. So I expected the output to be,
i=ABC
but the output instead was
i=C
and I can't understand why?
'ABC' in this case is a integer character constant as per section 6.4.4.4.10 of the standard.
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g.,'ab'), or containing a character or escape sequence
that does not map to a single-byteexecution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
In this case, 'A'==0x41, 'B'==0x42, 'C'==0x43, and your compiler then interprets i to be 0x414243. As said in the other answer, this value is implementation dependent.
When you try to access it using '%c', the overflown part will be cut and you are only left with 0x43, which is 'C'.
To get more insight to it, read the answers to this question as well.
The conversion specifier c used in this call
printf("i=%c",i);
in fact extracts one character from the integer argument. So using this specifier you in any case can not get three characters as the output.
From the C Standard (7.21.6.1 The fprintf function)
c If no l length modifier is present, the int argument is converted to
an unsigned char, and the resulting character is written
Take into account that the internal representation of a multi-byte character constant is implementation defined. From the C Standard (6.4.4.4 Character constants)
...The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character, is
implementation-defined.
'ABC' is an integer character constant. Depending on code set (overwhelming it is ASCII), endian, int width (apparently 32 bits in OP's case), it may have the same value like below. It is implementation defined behavior.
'ABC'
0x41424300
0x434241
or others.
The "%c" directs printf() to take the int value, cast it to unsigned char and print the associated character. This is the main reason for apparent loss of information.
In OP's case, it appears that i took on the value of 0x434241.
int i='A';
printf("i=%c",i); --> 'A'
// same as
printf("i=%c",0x434241); --> 'A'
if you want i to contain 3 characters you need to init a array that contains 3 characters
char i[3];
i[0]= 'A';
i[1]= 'B';
i[2]='C';
the ' ' can contain only one char your code converts the integer i into a character or better you store in your 32 bit intiger a converted 8 bit character. But i think You want to seperate the 32 bits into 8 bit containers make a char array like char i[3]. and then you will see that
int j=i;
this will result in an error because you are unable to convert a char array into a integer.
In C, 'A' is an int constant that's guaranteed to fit into a char.
'ABC' is a multicharacter constant. It has an int type, but an implementation defined value. The behaviour on using %c to print that in printf is possibly undefined if the value cannot fit into a char.

unable to get char of 2 character long in C [duplicate]

This question already has answers here:
How to determine the result of assigning multi-character char constant to a char variable?
(5 answers)
Closed 8 years ago.
following statement in c gives no error
char p='-1';
but the following gives error:
char p='-12';
ERROR: character can be one or two characters long.
I never knew that a char in c can ever be two characters long. However printf("%c",p) gives - as output. Where can i use char in c?
In C, a character constant like 'A' does not have type char, but rather type int. This creates the possibility that, even on a system where char is only 8 bits wide (and so int is wider than char), character constant notations can exist which provide integer values wider than char.
The C standard requires implementations to support multi-character constants, but their values are implementation-defined.
Why your compiler allows only two characters is likely because the type int is only 16 bits wide. Perhaps a constant like 'AB' is encoded similarly to, say, the expression ('A' << 8 | 'B'). According to the obvious extension of this scheme, 'ABC' would then have to be ('A' << 16 | 'B' << 8 | 'C') which doesn't fit into 16 bits and calls for out-of-range shifts. Hence, the two character limit.
In the GNU C compiler, four characters can be used:
#include <stdio.h>
int main(void)
{
printf("%x\n", (unsigned) 'ABCD');
return 0;
}
int is 32 bits wide, and this program prints 41424344 which, by golly, is hexadecimal for the ASCII characters ABCD. So this feature is useful for int-wide magic constants which are readable. Instead of:
#define MAGIC 0x41424344 /* This spells ABCD; easy to spot in memory dumps */
You can do this, which is nice, but less portable:
#define MAGIC 'ABCD'
What if we use five or more characters, like 'ABCDE'? Then GCC respond similarly to how Turbo C++ responds for three or more:
test.c:5:35: warning: character constant too long for its type [enabled by default]
It so happens that the program still compiles, and its output is unchanged: the E was truncated.
There is an important difference. The old Borland compiler is rejecting the excessively-long constant as an error. Though that is probably a good idea, it is not standard-conforming; when some value is implementation-defined, the implementation's response cannot be failure, such as stopping the translation or execution of the program. Issuing a diagnostic is fine, of course.
char p='-517';
printf("%c\n", p);
Running the above code gave me output 7 and a warning: overflow in implicit constant conversion [-Woverflow]
char can not contain more than 1 byte of information
You want an array of characters, also known as a C-string
// Note, if you initialize a character array with a literal string
// there is no need for a size specifier
char c[] = "-12";
// Note this is a method of copying one character array into another.
#include <string.h>
char c[4];
strcpy(c, "-12");
You'll notice that char c[4] has an indicated size of 4. Meaning, the array can only hold 4 characters. In C, character arrays have a special property: A null terminator (char '\0') is a sentinel value that C-string functions use to recognize the end of your string. So, in reality, a character string "-12" is of size 4. '-', '1', '2', and '\0'.
You can also access individual elements of an array by passing an indice to it's operator[] function.
printf("%s\n", c);
printf("%c\n", c[0]);
Notice the c[0] call, This will access the character '-' of the string "-12".
Hope I helped.

Why is an element of an array bigger than the type?

I am fairly new to C and during one of my exercises I encountered something I couldn't wrap my head around.
When I check the size of an element of tabel (which here is 'b') than I get 4. However if I were to check 'char' than I get 1. How come?
# include <stdio.h>
int main(){
char tabel[10] = {'b','f','r','o','a','u','v','t','o'};
int size_tabel = (sizeof(tabel));
int size_char = (sizeof('b'));
/*edit the above line to sizeof(char) to get 1 instead of 4*/
int length_tabel = size_tabel/size_char;
printf("size_tabel = %i, size_char = %i, lengte_tabel= %i",size_tabel,
size_char,length_tabel);
}
'b' is not of type char. 'b' is a literal and it's type is int.
From C11 Standard Draft (ISO/IEC 9899:201x): 6.4.4.4 Character constants: Description
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.
The 'b' literal is an int. And on your current platform, an int is 4 bytes.
An integer character constant, e.g., 'b' has type int in C.
From the C Standard:
(c11, 6.4.4.4p10) "An integer character constant has type int. [...] If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int."
This is different than C++ where an integer character constant has type char:
(c++11, 2.14.3) "An ordinary character literal that contains a single c-char has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set."
sizeof(tabel)
This will return the size of table which is sizeof(char) * 10
sizeof('b')
This will return the sizeof(char) which is one.

Why are 4 characters allowed in a char variable? [duplicate]

This question already has answers here:
How to determine the result of assigning multi-character char constant to a char variable?
(5 answers)
Closed 9 years ago.
I have the following code in my program:
char ch='abcd';
printf("%c",ch);
The output is d.
I fail to understand why is a char variable allowed to take in 4 characters in its declaration without giving a compile time error.
Note: More than 4 characters is giving an error.
'abcd' is called a multicharacter constant, and will has an implementation-defined value, here your compiler gives you 'd'.
If you use gcc and compile your code with -Wmultichar or -Wall, gcc will warn you about this.
I fail to understand why is a char variable allowed to take in 4
characters in its declaration without giving a compile time error.
It's not packing 4 characters into one char. The multi-character const 'abcd' is of type int and then the compiler does constant conversion to convert it to char (which overflows in this case).
Assuming you know that you are using multi-char constant, and what it is.
I don't use VS these days, but my take on it is, that 4-char multi-char is packed into an int, then down-casted to a char. That is why it is allowed. Since the packing order of multi-char constant into an integer type is compiler-defined it can behave like you observe it.
Because multi-character constants are meant to be used to fill integer typed, you could try 8-byte long multi-char. I am not sure whether VS compiler supports it, but there is a good chance it is, because that would fit into a 64-bit long type.
It probably should give a warning about trying to fit a literal value too big for the type. It's kind of like unsigned char leet = 1337;. I am not sure, however, how does this work in VS (whether it fires a warning or an error).
4 characters are not being put into a char variable, but into an int character constant which is then assigned to a char.
3 parts of the C standard (C11dr §6.4.4.4) may help:
"An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'."
"An integer character constant has type int."
"The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."
OP's code of char ch='abcd'; is the the assignment of an int to a char as 'abcd' is an int. Just like char ch='Z';, ch is assigned the int value of 'Z'. In this case, there is no surprise, as the value of 'Z' fits nicely in a char. In the 'abcd', case, the value does not fit in a char and so some information is lost. Various outcomes are possible. Typically on one endian platform, ch will have a value of 'a' and on another, the value of 'd'.
The 'abcd' is an int value, much like 12345 in int x = 12345;.
When the size(int) == 4, an int may be assigned a character constant such as 'abcd'.
When size(int) != 4, the limit changes. So with an 8-char int, int x = 'abcdefgh'; is possible. etc.
Given that an int is only guaranteed to have a minimum range -32767 to 32767, anything beyond 2 is non-portable.
The int endian-ness of even int = 'ab'; presents concerns.
Character constant like 'abcd' are typically used incorrectly and thus many compilers have a warning that is good to enable to flag this uncommon C construct.

Resources