Dealing with char values over 127 in C

Dealing with char values over 127 in C - c

I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d.But if I try to print a character from its numerical value like this:
printf("%c\n", 128);
it just prints FFFD (the replacement character).Here is another example:
char abc[] = {126, 128, '\0'}; // Manually assigning values
printf("%c", abc[0]); // Prints "~", as expected
printf("%c", 121); // Prints "y"
pritf("%c", abc[1]; // Should print "€", I think, but I get "�"
I'm a bit confused since I can just print every character below 128 in these ways.The reason I'm asking this, is because I need to generate a (pseudo)random byte sequence using the rand() function.Here is an example:
char abc[10];
srand(time(NULL));
abc[0] = rand() % 256; // Gives something between 00:FF ...
printf("%c", abc[0]); // ... but I get "�"
If this is of any help, the source code is encoded in UTF-8, but changing encoding doesn't have any effect.

In C, a char is a different type than unsigned char and signed char. It has the range CHAR_MIN to CHAR_MAX. Yet it has the same range as one of unsigned char/signed char. Typically these are 8-bit types, but could be more. See CHAR_BIT. So the typical range is [0 to 255] or [-128 to 127]
If char is unsigned, abc[1] = 128 is fine. If char is signed, abc[1] = 128 is implementation-defined (see below). The typical I-D is the abc[1] will have the value of -128.
printf("%c\n", 128); will send the int value 128 to printf(). The "%c" will cast that value to an unsigned char. So far no problems. What appears on the output depends on how the output device handles code 128. Perhaps Ç, perhaps something else.
printf("%c", abc[1]; will send 128 or is I-D. If I-D and -128 was sent, then casting -128 to unsigned char is 128 and again the code for 128 is printed.
If the output device is expecting UTF8 sequences, a UTF8 sequence beginning with code 128 is invalid (it is an unexpected continuation byte) and many such systems will print the replacement character which is unicode FFFD.
Converting a value outside the range of of a signed char to char invokes:
the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3

First of all, let me tell you, signed-ness of a char is implementation defined.
If you have to deal with char values over 127, you can use unsigned char. It can handle 0-255.
Also, you should be using %hhu format specifier to print the value of an unsigned char.

If you're dealing with bytes, use unsigned char instead of char for your datatypes.
With regard to printing, you can print the bytes in hex instead of decimal or as characters:
printf("%02X", abc[0]);
You probably don't want to print these bytes as characters, as you'll most likely be dealing with UTF-8 character encoding which doesn't seem to be what you're looking for.

Related

How does printing 577 with %c output "A"?

#include<stdio.h>
int main()
{
int i = 577;
printf("%c",i);
return 0;
}
After compiling, its giving output "A". Can anyone explain how i'm getting this?

%c will only accept values up to 255 included, then it will start from 0 again !
577 % 256 = 65; // (char code for 'A')

This has to do with how the value is converted.
The %c format specifier expects an int argument and then converts it to type unsigned char. The character for the resulting unsigned char is then written.
Section 7.21.6.1p8 of the C standard regarding format specifiers for printf states the following regarding c:
If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
When converting a value to a smaller unsigned type, what effectively happens is that the higher order bytes are truncated and the lower order bytes have the resulting value.
Section 6.3.1.3p2 regarding integer conversions states:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Which, when two's complement representation is used, is the same as truncating the high-order bytes.
For the int value 577, whose value in hexadecimal is 0x241, the low order byte is 0x41 or decimal 65. In ASCII this code is the character A which is what is printed.

How does printing 577 with %c output "A"?
With printf(). "%c" matches an int argument*1. The int value is converted to an unsigned char value of 65 and the corresponding character*2, 'A' is then printed.
This makes no difference if a char is signed or unsigned or encoded with 2's complement or not. There is no undefined behavior (UB). It makes no difference how the argument is passed, on the stack, register, or .... The endian of int is irrelevant. The argument value is converted to an unsigned char and the corresponding character is printed.
*1All int values are allowed [INT_MIN...INT_MAX].
When a char value is passed as ... argument, it is first converted to an int and then passed.
char ch = 'A';
printf("%c", ch); // ch is converted to an `int` and passed to printf().
*2 65 is an ASCII A, the ubiquitous encoding of characters. Rarely other encodings are used.

Just output the value of the variable i in the hexadecimal representation
#include <stdio.h>
int main( void )
{
int i = 577;
printf( "i = %#x\n", i );
}
The program output will be
i = 0x241
So the least significant byte contains the hexadecimal value 0x41 that represents the ASCII code of the letter 'A'.

577 in hex is 0x241. The ASCII representation of 'A' is 0x41. You're passing an int to printf but then telling printf to treat it as a char (because of %c). A char is one-byte wide and so printf looks at the first argument you gave it and reads the least significant byte which is 0x41.
To print an integer, you need to use %d or %i.

Convert raw ASCII data to Hex string

I have the following code to convert raw ASCII data to Hex string. The full c code can be found here
void str2hex(char* inputStr, char* outputStr)
{
int i;
int counter;
i=0;
counter=0;
while(inputStr[counter] != '\0')
{
sprintf((char*)(outputStr+i),"%02X", inputStr[counter]);
i+=2; counter+=1;
}
outputStr[i++] = '\0';
}
It works fine for most of the values. But when I am trying the following input from terminal using echo as stdin echo 11223344556677881122334455667788|xxd -r -p| ./CProgram --stdin
11223344556677881122334455667788
It returns the following output
11223344556677FF11223344556677FF
As it can be seen instead of 88 it returns FF.
How can I adjust this code to get 88 instead of FF.

There are multiple issues all coalescing into your problem.
The first issue is that it's compiler-defined if char is a signed or unsigned integer type. Your compiler seem to have signed char types.
The second issue is that on most systems today, signed integers are represented using two's complement, where the most significant bit indicates the sign.
The third issue is that vararg functions like printf will do default argument promotion of its arguments. That means types smaller than int will be promoted to int. And that promotion will keep the value of the converted integer, which means negative values will be sign-extended. Sign-extension means that the most significant bit will be copied all the way to the "top" when extending the value. That means the signed byte 0xff will be extended to 0xffffffff when promoted to an int.
Now when your code tries to convert the byte 0x88 it will be treated as the negative number -120, not 136 as you might expect.
There are two possible solutions to this:
Explicitly use unsigned char for the input string:
void str2hex(const unsigned char* inputStr, char* outputStr);
Use the hh prefix in the printf format:
sprintf((char*)(outputStr+i),"%02hhX", inputStr[counter]);
This tells sprintf that the argument is a single byte, and will mask out the upper bits of the (promoted) integer.

Relationship between char and ASCII Code?

My computer science teacher taught us that which data type to declare depends on the size of the value for a variable you need. And then he demonstrated having a char add and subtract a number to output a different char. I remember he said this is something to do with ASCII Code. Can anyone explain this more specifically and clearly ? So, is char considerd as a number(since we can do math with it ) or a character or both? Can we print out the number behind a char?how?

So, is char considerd as a number or a character or both?
Both. It is an integer, but that integer value represents a character, as described by the character encoding of your system. The character encoding of the system that your computer science teacher uses happens to be ASCII.
Can we print out the number behind a char?how?
C++ (as the question used to be tagged):
The behaviour of the character output stream (such as std::cout) is to print the represented character when you insert an integer of type char. But the behaviour for all other integer types is to print the integer value. So, you can print the integer value of a char by converting it to another integer type:
std::cout << (unsigned)'c';
C:
There are no templated output streams, so you don't need to do explicit conversion to another integer (except for the signedness). What you need is the correct format specifier for printf:
printf("%hhu", (unsigned char)'c');
hh is for integer of size char, u is to for unsigned as you probably are interested in the unsigned representation.

A char can hold a number, it's the smallest integer type available on your machine and must have at least 8 bits. It is synonymous to a byte.
It's typical use is to store the codes of characters. Computers can only deal with numbers, so, to represent characters, numbers are used. Of course you must agree on which number means which character.
C doesn't require a specific character encoding, but most systems nowadays use a superset of ASCII (this is a very old encoding using only 7 bits) like e.g. UTF-8.
So, if you have a char that holds a character and you add or subtract some value, the result will be another number that happens to be the code for a different character.
In ASCII, the characters 0-9, a-z and A-Z have adjacent code points, therefore by adding e.g. 2 to A, the result will be C.
Can we print out the number behind a char?
Of course. It just depends whether you interpret the value in the char as just a number or as the code of a character. E.g. with printf:
printf("%c\n", 'A'); // prints the character
printf("%hhu\n", (unsigned char)'A'); // prints the number of the code
The cast to (unsigned char) is only needed because char is allowed to be either signed or unsigned, we want to treat it as unsigned here.

A char takes up a single byte. On systems with an 8 bit byte this gives it a range (assuming char is signed) of -128 to 127. You can print this value as follows:
char a = 65;
printf("a=%d\n", a);
Output:
65
The %d format specifier prints its argument as a decimal integer. If on the other hand you used the %c format specifier, this prints the character associated with the value. On systems that use ASCII, that means it prints the ASCII character associated with that number:
char a = 65;
printf("a=%c\n", a);
Output:
A
Here, the character A is printed because 65 is the ASCII code for A.
You can perform arithmetic on these numbers and print the character for the resulting code:
char a = 65;
printf("a=%c\n", a);
a = a + 1;
printf("a=%c\n", a);
Output:
A
B
In this example we first print A which is the ASCII character with code 65. We then add 1 giving us 66. Then we print the ASCII character for 66 which is B.

Every variable is stored in binary (i.e as a number,) chars, are just numbers of a specific size.
They represent a character when encoded using some character encoding, the ASCII standard (www.asciitable.com) is here.
As in the #Igor comment, if you run the following code; you see the ASCII character, Decimal and Hexadecimal representation of your char.
char c = 'A';
printf("%c %d 0x%x", c, c, c);
Output:
A 65 0x41
As an exercise to understand it better, you could make a program to generate the ASCII Table yourself.

My computer science teacher taught us that which data type to declare depends on the size of the value for a variable you need.
This is correct. Different types can represent different ranges of values. For reference, here are the various integral types and the minimum ranges they must be able to represent:
Type Minimum Range
---- -------------
signed char -127...127
unsigned char 0...255
char same as signed or unsigned char, depending on implementation
short -32767...32767
unsigned short 0...65535
int -32767...32767
unsigned int 0...65535
long -2147483647...2147483647
unsigned long 0...4294967295
long long -9223372036854775807...9223372036854775807
unsigned long long 0...18446744073709551615
An implementation may represent a larger range in a given type; for example, on most modern implementations, the range of an int is the same as the range of a long.
C doesn't mandate a fixed size (bit width) for the basic integral types (although unsigned types are the same size as their signed equivalent); at the time C was first developed, byte and word sizes could vary between architectures, so it was easier to specify a minimum range of values that the type had to represent and leave it to the implementor to figure out how to map that onto the hardware.
C99 introduced the stdint.h header, which defines fixed-width types like int8_t (8-bit), int32_t (32-bit), etc., so you can define objects with specific sizes if necessary.
So, is char considerd as a number(since we can do math with it ) or a character or both?
char is an integral data type that can represent values in at least the range [0...127]1, which is the range of encodings for the basic execution character set (upper- and lowercase Latin alphabet, decimal digits 0 through 9, and common punctuation characters). It can be used for storing and doing regular arithmetic on small integer values, but that's not the typical use case.
You can print char objects out as a characters or numeric values:
#include <limits.h> // for CHAR_MAX
...
printf( "%5s%5s\n", "dec", "char" );
printf( "%5s%5s\n", "---", "----" );
for ( char i = 0; i < CHAR_MAX; i++ )
{
printf("%5hhd%5c\n", i, isprint(i) ? i : '.' );
}
That code will print out the integral value and the associated character, like so (this is ASCII, which is what my system uses):
...
65 A
66 B
67 C
68 D
69 E
70 F
71 G
72 H
73 I
...
Control characters like SOH and EOT don't have an associated printing character, so for those value the code above just prints out a '.'.
By definition, a char object takes up a single storage unit (byte); the number of bits in a single storage unit must be at least 8, but could be more.
Plain char may be either signed or unsigned depending on the implementation so it can represent additional values outside that range, but it must be able to represent *at least* those values.

Difference between char and int when declaring character

I just started learning C and am rather confused over declaring characters using int and char.
I am well aware that any characters are made up of integers in the sense that the "integers" of characters are the characters' respective ASCII decimals.
That said, I learned that it's perfectly possible to declare a character using int without using the ASCII decimals. Eg. declaring variable test as a character 'X' can be written as:
char test = 'X';
and
int test = 'X';
And for both declaration of character, the conversion characters are %c (even though test is defined as int).
Therefore, my question is/are the difference(s) between declaring character variables using char and int and when to use int to declare a character variable?

The difference is the size in byte of the variable, and from there the different values the variable can hold.
A char is required to accept all values between 0 and 127 (included). So in common environments it occupies exactly
one byte (8 bits). It is unspecified by the standard whether it is signed (-128 - 127) or unsigned (0 - 255).
An int is required to be at least a 16 bits signed word, and to accept all values between -32767 and 32767. That means that an int can accept all values from a char, be the latter signed or unsigned.
If you want to store only characters in a variable, you should declare it as char. Using an int would just waste memory, and could mislead a future reader. One common exception to that rule is when you want to process a wider value for special conditions. For example the function fgetc from the standard library is declared as returning int:
int fgetc(FILE *fd);
because the special value EOF (for End Of File) is defined as the int value -1 (all bits to one in a 2-complement system) that means more than the size of a char. That way no char (only 8 bits on a common system) can be equal to the EOF constant. If the function was declared to return a simple char, nothing could distinguish the EOF value from the (valid) char 0xFF.
That's the reason why the following code is bad and should never be used:
char c; // a terrible memory saving...
...
while ((c = fgetc(stdin)) != EOF) { // NEVER WRITE THAT!!!
...
}
Inside the loop, a char would be enough, but for the test not to succeed when reading character 0xFF, the variable needs to be an int.

The char type has multiple roles.
The first is that it is simply part of the chain of integer types, char, short, int, long, etc., so it's just another container for numbers.
The second is that its underlying storage is the smallest unit, and all other objects have a size that is a multiple of the size of char (sizeof returns a number that is in units of char, so sizeof char == 1).
The third is that it plays the role of a character in a string, certainly historically. When seen like this, the value of a char maps to a specified character, for instance via the ASCII encoding, but it can also be used with multi-byte encodings (one or more chars together map to one character).

Size of an int is 4 bytes on most architectures, while the size of a char is 1 byte.

Usually you should declare characters as char and use int for integers being capable of holding bigger values. On most systems a char occupies a byte which is 8 bits. Depending on your system this char might be signed or unsigned by default, as such it will be able to hold values between 0-255 or -128-127.
An int might be 32 bits long, but if you really want exactly 32 bits for your integer you should declare it as int32_t or uint32_t instead.

I think there's no difference, but you're allocating extra memory you're not going to use. You can also do const long a = 1;, but it will be more suitable to use const char a = 1; instead.

ansi-c converting char to int representable by ascii

hi i am interested in those chars which are representable by ascii table. for that reason i am doing the following:
int t(char c) { return (int) c; }
...
if(!(t(d)>255)) { dostuff(); }
so i am interested in only ascii table representable chars, which i assume after conversion to int should be less than 256, am i right? thanks!

Usually (not always) a char is 8-bits so all chars would typically have a value of less than 256. So your test would always succeed.
Also, ASCII only goes up to 127, not 255. The characters after that are not standard ASCII, and can vary depending on code pages.
If you are dealing with international characters you should probably use wide characters instead of char.

Use the library:
#include <ctype.h>
...
if (isascii(d)) { dostuff(); }

Two caveats:
The C standard does not decide if char is by default signed or unsigned. If your compiler treated char as signed by default the cast to int could result in negative values instead of the values from 128 to 255 (and this is assuming that your chars are 8-bit, too). Perhaps it's better to use unsigned char if you want to be sure this range will be converted the way you expect.
Technically ASCII is from 0 to 127, everything above is some kind of extension.

char is an integral type in C. You can do the check directly:
char c;
/* assign to c */
if (c >= 0 && c <= 127) {
/* in ASCII range */
}
I am assuming you don't want to use isascii() (it's not in the C standard, although it is POSIX).
Also, you can check if CHAR_MAX is equal to 127. If it is, you don't need the comparison with 127, since c will not exceed it by definition. Similarly, if CHAR_MIN is 0, then you don't need the comparison with 0. Both CHAR_MIN and CHAR_MAX are defined in limits.h.
I think you're thinking about an integer value overflowing a char, and therefore convert it to an int. But, that doesn't help with overflow since the damage has already been done.

Size of char is always 1 byte (as per standard). For all practical matters this means that a char var cannot have a value bigger than 255. (though there are systems, where a byte has more than 8 bits and thus a char value can be bigger, but these are rare nowadays)
Additional caveat is that if char is not defined as signed or unsigned, so it can be in the -128 to 127 range or the 0 to 255 range. (assuming 8 bits per byte, of course :-))
Meanwhile, the ASCII table is 7-bit, which means it covers the range of 0 to 127. So if you are interested in only ASCII symbols, you can just check if the value of your char var is in that range. No need to cast for the comparison.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Dealing with char values over 127 in C - c

First of all, let me tell you, signed-ness of a char is implementation defined. If you have to deal with char values over 127, you can use unsigned char. It can handle 0-255. Also, you should be using %hhu format specifier to print the value of an unsigned char.

Related

How does printing 577 with %c output "A"?

Convert raw ASCII data to Hex string

Relationship between char and ASCII Code?

Difference between char and int when declaring character

ansi-c converting char to int representable by ascii

Categories

Resources