Ambiguous result with char and toupper

Ambiguous result with char and toupper - c

How does 'ab' was converted to 24930 when stored in char?
#include<stdio.h>
int main(){
char c = 'ab';
c = toupper(c);
printf("%c", c);
return 0;
}
GCC compiler warning : Overflow in conversion from 'int' to 'char' changes value from '24930' to '98'
Output : B
If possible please explain how char handled the multiple characters here.

From the C Standard (6.4.4.4 Character constants)
10 An integer character constant has type int. The value of an
integer character constant containing a single character that maps to
a single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined.
So this character constant 'ab' is stored as an object of the type int. Then it is assigned to ab object of the type char like in this declaration
char c = 'ab';
then the least significant byte of the object of the type int is used to initialize the object c. It seems in your case the character 'b' was stored in this least significant byte and was assigned to the object c.

Your characters are stored into 4 bytes on the right side. 24930 = 0x6162 equivalent to 0x61 'a' and 0x62 'b' = 'ab'
98 is your 'b', 0x62 in hexa, 98 in decimal, check with man ascii. Because of the overflow, and the fact that your system works in little endian, your 2 chars are stored on 4 bytes like this:
0x62 0x61 0x00 0x00
Because only one char is supposed to be assigned to c (sizeof char is equal to 1 byte, with a max limit of 256 bits), it truncates and keep only the first byte, your 'b'.
You can test it easily with char c = 'abcd' it will print 'D'.

A conforming char c holds a single character... multiple characters require char *string and you need to iterate through the string, eg something like:
#include <stdio.h>
#include <cype.h> // Needed for toupper()
int main()
{
char *mystring = "ab\0"; // Null terminate
char *mychar = mystring;
while ( *mychar != '\0' )
{
char c = toupper( *mychar );
printf( "%c", c );
mychar++;
}
return 0;
}
As an aside, toupper() returns an int, so there is an implicit type conversion there.

Related

How does printing 577 with %c output "A"?

#include<stdio.h>
int main()
{
int i = 577;
printf("%c",i);
return 0;
}
After compiling, its giving output "A". Can anyone explain how i'm getting this?

%c will only accept values up to 255 included, then it will start from 0 again !
577 % 256 = 65; // (char code for 'A')

This has to do with how the value is converted.
The %c format specifier expects an int argument and then converts it to type unsigned char. The character for the resulting unsigned char is then written.
Section 7.21.6.1p8 of the C standard regarding format specifiers for printf states the following regarding c:
If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
When converting a value to a smaller unsigned type, what effectively happens is that the higher order bytes are truncated and the lower order bytes have the resulting value.
Section 6.3.1.3p2 regarding integer conversions states:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Which, when two's complement representation is used, is the same as truncating the high-order bytes.
For the int value 577, whose value in hexadecimal is 0x241, the low order byte is 0x41 or decimal 65. In ASCII this code is the character A which is what is printed.

How does printing 577 with %c output "A"?
With printf(). "%c" matches an int argument*1. The int value is converted to an unsigned char value of 65 and the corresponding character*2, 'A' is then printed.
This makes no difference if a char is signed or unsigned or encoded with 2's complement or not. There is no undefined behavior (UB). It makes no difference how the argument is passed, on the stack, register, or .... The endian of int is irrelevant. The argument value is converted to an unsigned char and the corresponding character is printed.
*1All int values are allowed [INT_MIN...INT_MAX].
When a char value is passed as ... argument, it is first converted to an int and then passed.
char ch = 'A';
printf("%c", ch); // ch is converted to an `int` and passed to printf().
*2 65 is an ASCII A, the ubiquitous encoding of characters. Rarely other encodings are used.

Just output the value of the variable i in the hexadecimal representation
#include <stdio.h>
int main( void )
{
int i = 577;
printf( "i = %#x\n", i );
}
The program output will be
i = 0x241
So the least significant byte contains the hexadecimal value 0x41 that represents the ASCII code of the letter 'A'.

577 in hex is 0x241. The ASCII representation of 'A' is 0x41. You're passing an int to printf but then telling printf to treat it as a char (because of %c). A char is one-byte wide and so printf looks at the first argument you gave it and reads the least significant byte which is 0x41.
To print an integer, you need to use %d or %i.

Dealing with char values over 127 in C

I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d.But if I try to print a character from its numerical value like this:
printf("%c\n", 128);
it just prints FFFD (the replacement character).Here is another example:
char abc[] = {126, 128, '\0'}; // Manually assigning values
printf("%c", abc[0]); // Prints "~", as expected
printf("%c", 121); // Prints "y"
pritf("%c", abc[1]; // Should print "€", I think, but I get "�"
I'm a bit confused since I can just print every character below 128 in these ways.The reason I'm asking this, is because I need to generate a (pseudo)random byte sequence using the rand() function.Here is an example:
char abc[10];
srand(time(NULL));
abc[0] = rand() % 256; // Gives something between 00:FF ...
printf("%c", abc[0]); // ... but I get "�"
If this is of any help, the source code is encoded in UTF-8, but changing encoding doesn't have any effect.

In C, a char is a different type than unsigned char and signed char. It has the range CHAR_MIN to CHAR_MAX. Yet it has the same range as one of unsigned char/signed char. Typically these are 8-bit types, but could be more. See CHAR_BIT. So the typical range is [0 to 255] or [-128 to 127]
If char is unsigned, abc[1] = 128 is fine. If char is signed, abc[1] = 128 is implementation-defined (see below). The typical I-D is the abc[1] will have the value of -128.
printf("%c\n", 128); will send the int value 128 to printf(). The "%c" will cast that value to an unsigned char. So far no problems. What appears on the output depends on how the output device handles code 128. Perhaps Ç, perhaps something else.
printf("%c", abc[1]; will send 128 or is I-D. If I-D and -128 was sent, then casting -128 to unsigned char is 128 and again the code for 128 is printed.
If the output device is expecting UTF8 sequences, a UTF8 sequence beginning with code 128 is invalid (it is an unexpected continuation byte) and many such systems will print the replacement character which is unicode FFFD.
Converting a value outside the range of of a signed char to char invokes:
the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3

First of all, let me tell you, signed-ness of a char is implementation defined.
If you have to deal with char values over 127, you can use unsigned char. It can handle 0-255.
Also, you should be using %hhu format specifier to print the value of an unsigned char.

If you're dealing with bytes, use unsigned char instead of char for your datatypes.
With regard to printing, you can print the bytes in hex instead of decimal or as characters:
printf("%02X", abc[0]);
You probably don't want to print these bytes as characters, as you'll most likely be dealing with UTF-8 character encoding which doesn't seem to be what you're looking for.

What's the difference between these two uses of sizeof() in C?

If I do sizeof('r'), the character 'r' requires 4 bytes in memory. Alternatively, if I first declare a char variable and initialize it like so:
char val = 'r';
printf("%d\n", sizeof(val));
The output indicates that 'r' only requires 1 byte in memory.
Why is this so?

This is because the constant 'c' is interpreted as an int.
If you run this:
printf("%d\n", sizeof( (char) 'c' ) );
it will print 1.

In C literal 'c' is called integer character constant and according to the C Standard:
10 An integer character constant has type int.
On the other hand, in C++ this literal is called character literal and according to the C++ Standard:
An ordinary character literal that contains a single c-char
representable in the execution character set has type char.
In this declaration
char val = 'r';
variable val is explicitly declared as having type char. In both the languages sizeof( char ) is equal to 1.

This is because the literal 'r' is considered an integer and its value is its ASCII value. An int requires generally 4 bytes hence the output. With the second case you are explicitly declaring it as a character, hence it outputs 1.
If you try this line printf("%d",(10+'c')); It will print 109 as the output i.e. (10+99).

For some clarification you may want to take a look at this table.
http://goo.gl/nOa5ju (ascii table for chars)
Firstly, in C there are two types of int. 16 bit (2 byte) and 32 bit (4 byte).
A constant char in C is considered an int which relates to the character it represents on the table. The decimal value of 'c' is 99 (2 bytes per).
There you go, you got the char or in other words int value being 99 or 4 bytes.
On the other hand though the char var = 'c'; is a 1 byte value because ASCII is represented with 8 bits (1 byte).
table of c type sizes http://goo.gl/yhxmSF

What's the diffrence between \xFF and 0xFF

1st - What's the difference between
#define s 0xFF
and
#define s '\xFF'
2nd - Why the second line equals to -1?
3rd - Why after I try this (in the case of '\xFF')
unsigned char t = s;
putchar(t);
unsigned int p = s;
printf("\n%d\n", p);
the output is
(blank)
-1
?
thanks:)

This
#define s 0xFF
is a definition of hexadecimal integer constant. It has type int and its value is 255 in decimal notation.
This
#define s '\xFF'
is a definition of integer character constant that represented by a hexadecimal escape sequence. It also has type int but it represents a character. Its value is calculated differently.
According to the C Standard (p.#10 of section 6.4.4.4 Character constants)
...If an integer character constant contains a single character or
escape sequence, its value is the one that results when an object with
type char whose value is that of the single character or escape
sequence is converted to type int.
It seems that by default your compiler considers values of type char as values of type signed char. So according to the quote integer character constant
'\xFF' has negative value because the sign bit (MSB) is set and is equal to -1.
If you set the option of the compiler that controls whether type char is considered as signed or unsigned to unsigned char then '\xFF' and 0xFF will have the same value that is 255.
Take into account that hexadecimal escape sequences may be used in string literals along with any other escape sequences.

You can use '\xFF' in a string literal as last character and also as middle character using string concatenation but same is not true for 0xFF.
Difference between '\xFF' and 0xFF is analogous to difference between 'a' and code of character 'a' (Let's assume it is 0x61 for some implementation) with only difference '\xFF' will consume further hex characters if used in string.
When you print the character FF using putchar, output is implementation dependent. But when you print it as an integer, due to default promotion rule of varargs, it may print -1 or 255 on systems where char behaves as signed char and unsigned char respectively.

non-ASCII character declaration

I would like to store a character (in order to compare it with other characters).
If I declare the variable like this :
char c = 'é';
everything works well, but I get these warnings :
warning: multi-character character constant [-Wmultichar]
char c = 'é';
^
ii.c:12:3: warning: overflow in implicit constant conversion [-Woverflow]
char c = 'é';
I think I understand why there is these warnings, but I wonder why does it still work?
And should I define it like this : int d = 'é'; although it takes more space in memory?
Moreover, I also get the warning below with this declaration :
warning: multi-character character constant [-Wmultichar]
int d = 'é';
Do I miss something? Thanks ;)

Try using wchar_t rather than char. char is a single byte, which is appropriate for ASCII but not for multi-byte character sets such as UTF-8. Also, flag your character literal as being a wide character rather than a narrow character:
#include <wchar.h>
...
wchar_t c = L'é';

é has the Unicode code point 0xE9, the UTF-8 encoding is "\xc3\xa9".
I assume your source file is encoded in UTF-8, so
char c = 'é';
is (roughly) equivalent to
char c = '\xc3\xa9';
How such character constants are treated is implementation-defined. For GCC:
The compiler evaluates a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not (a slight change from versions 3.1 and earlier of GCC). If there are more characters in the constant than would fit in the target int the compiler issues a warning, and the excess leading characters are ignored.
For example, 'ab' for a target with an 8-bit char would be interpreted as (int) ((unsigned char) 'a' * 256 + (unsigned char) 'b'), and '\234a' as (int) ((unsigned char) '\234' * 256 + (unsigned char) 'a').
Hence, 'é' has the value 0xC3A9, which fits into an int (at least for 32-bit int), but not into an (8-bit) char, so the conversion to char is again implementation-defined:
For conversion to a type of width N, the value is reduced modulo 2N to be within range of the type; no signal is raised.
This gives (with signed char)
#include <stdio.h>
int main(void) {
printf("%d %d\n", 'é', (char)'é');
if((char)'é' == (char)'©') puts("(char)'é' == (char)'©'");
}
Output:
50089 -87
(char)'é' == (char)'©'
50089 is 0xC3A9, 87 is 0xA9.
So you lose information when storing é into a char (there are characters like © which compare equal to é). You can
Use wchar_t, an implementation-dependent wide character type which is 4 byte on Linux holding UTF-32: wchar_t c = L'é';. You can convert them to the locale-specific multibyte-encoding (probably UTF-8, but you'll need to set the locale before, see setlocale; note, that changing the locale may change the behaviour of functions like isalpha or printf) by wcrtomb or use them directly and also use wide strings (use the L prefix to get wide character string literals)
Use a string and store UTF-8 in it (as in const char *c = "é"; or const char *c = "\u00e9"; or const char *c = "\xc3\xa9;", with possibly different semantics; for C11, perhaps also look for UTF-8 string literals and the u8 prefix)
Note, that file streams have an orientation (cf. fwide).
HTH

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Ambiguous result with char and toupper - c

Related

How does printing 577 with %c output "A"?

Dealing with char values over 127 in C

What's the difference between these two uses of sizeof() in C?

What's the diffrence between \xFF and 0xFF

non-ASCII character declaration

Categories

Resources