ASCII and printf - c

I have a little (big, dumb?) question about int and chars in C. I rememeber from my studies that "chars are little integers and viceversa," and that's okay to me. If I need to use small numbers, the best way is to use a char type.
But in a code like this:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int i= atoi(argv[1]);
printf("%d -> %c\n",i,i);
return 0;
}
I can use as argument every number I want. So with 0-127 I obtain the expected results (the standard ASCII table) but even with bigger or negative numbers it seems to work...
Here is some example:
-181 -> K
-182 -> J
300 -> ,
301 -> -
Why? It seems to me that it's cycling around the ascii table, but I don't understand how.

When you pass an int corresponding to the "%c" conversion specifier, the int is converted to an unsigned char and then written.
The values you pass are being converted to different values when they are outside the range of an unsigned (0 to UCHAR_MAX). The system you are working on probably has UCHAR_MAX == 255.
When converting an int to an unsigned char:
If the value is larger than
UCHAR_MAX, (UCHAR_MAX+1) is
subtracted from the value as many
times as needed to bring it into the
range 0 to UCHAR_MAX.
Likewise, if the
value is less than zero, (UCHAR_MAX+1)
is added to the value as many times
as needed to bring it into the range
0 to UCHAR_MAX.
Therefore:
(unsigned char)-181 == (-181 + (255+1)) == 75 == 'K'
(unsigned char)-182 == (-182 + (255+1)) == 74 == 'J'
(unsigned char)300 == (300 - (255+1)) == 44 == ','
(unsigned char)301 == (301 - (255+1)) == 45 == '-'

The %c format parameter interprets the corresponding value as a character, not as an integer. However, when you lie to printf and pass an int in what you tell it is a char, its internal manipulation of the value (to get a char back, as a char is normally passed as an int anyway, with varargs) happens to yield the values you see.

My guess is that %c takes the first byte of the value provided and formats that as a character. On a little-endian system such as a PC running Windows, that byte would represent the least-significant byte of any value passed in, so consecutive numbers would always be shown as different characters.

You told it the number is a char, so it's going to try every way it can to treat it as one, despite being far too big.
Looking at what you got, since J and K are in that order, I'd say it's using the integer % 128 to make sure it fits in the legal range.

Edit: Please disregard this "answer".
Because you are on a little-endian machine :)
Serously, this is an undefined behavior. Try changing the code to printf("%d -> %c, %c\n",i,i,'4'); and see what happens then...

When we use the %c in printf statement, it can access only the first byte of the integer.
Hence anything greater than 256 is treated as n % 256.
For example
i/p = 321 yields op=A

What atoi does is converting the string to numerical values, so that "1234" gets 1234 and not just a sequence of the ordinal numbers of the string.
Example:
char *x = "1234"; // x[0] = 49, x[1] = 50, x[2] = 51, x[3] = 52 (see the ASCII table)
int y = atoi(x); // y = 1234
int z = (int)x[0]; // z = 49 which is not what one would want

Related

How is the output 15 here? can someone explain it to me? I didn't really understand the use of putc and stdout

int *z;
char *c = "123456789";
z = c;
putc(*(z++),stdout);
putc(*z, stdout);
return 0;
The output is 15 thats for certain but how does this happen.
Let's look at this code operation by operation. For this reason I have rewritten it (both codes are equivalent):
int *z;
char *c = "123456789";
z = c;
putc(*z, stdout);
z++;
putc(*z, stdout);
Because you use z++, the post increment, ++z would be pre-increment, the value of z is fetched, before incrementing it.
After the first three lines everything looks as expected. Both pointers point to the string "123456789". But the first putc is already interesting. z is a pointer to an integer, but it points to a character, so fetching the pointer value fetches 4 bytes instead of one. Because you are using a little endian machine, the the 3 higher bytes are truncated by converting the integer (with the bytes "1234" = 875770417) to a character. The lowest byte ('1' = 49) remains.
In the line z++;, not 1, but 4 is added to the address z points to, because z is expected to point to an integer with the size of 4 bytes. On your system an int has apparently 4 bytes. So instead of pointing to "23456789" it points to "56789". This is just how pointer arithmetic work in C.
The last line works exactly like the 5th, but this time z points to "5678" (= 943142453) which gets truncated to 53 (= '5'). On big endian machines, the code would print "48" instead.

Issue with turning a character into an integer in C

I am having issues with converting character variables into integer variables. This is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char string[] = "A2";
char letter = string[0];
char number = string[1];
char numbers[] = "12345678";
char letters[] = "ABCDEFGH";
int row;
int column;
for(int i = 0; i < 8; i++){
if(number == numbers[i]){
row = number;
}
}
}
When I try to convert the variable row into the integer value of the variable number, instead of 2 I get 50. The goal so far is to convert the variable row into the accurate value of the character variable number, which in this case is 2. I'm a little confused as to why the variable row is 50 and not 2. Can any one explain to me why it is not converting accurately?
'2' != 2. The '2' character, in ASCII, is 50 in decimal (0x32 in hex). See http://www.asciitable.com/
If you're sure they're really numbers you can just use (numbers[i] - '0') to get the value you're looking for.
2 in your case is a character, and that character's value is 50 because that's the decimal version of the byte value that represents the character 2 in ASCII. Remember, c is very low level and characters are essentially the same thing as any other value: a sequence of bytes. Just like letters are represented as bytes, so are the character representation of their value in our base 10 system. It might seem that 2 should have been represented with the value 2, but it wasn't.
If you use the atoi function, it will look at the string and compute the decimal value represented by the characters in your string.
However, if you're only converting one character to the decimal value it represents , you can take a short cut. subtract the digit from the value of '0'. Though the digits are not represented by the base 10 value they have for us humans, they are ordered sequentially in the ASCII code. And since in C the characters are simply byte values, the difference between a numeric character 0-9 and 0 is the value of the character.
char c = '2';
int i = c - '0';
If you understand why that would work, you get what I'm saying.

8 Byte Number as Hex in C

I have given a number, for example n = 10, and I want to calculate its length in hex with big endian and save it in a 8 byte char pointer. In this example I would like to get the following string:
"\x00\x00\x00\x00\x00\x00\x00\x50".
How do I do that automatically in C with for example sprintf?
I am not even able to get "\x50" in a char pointer:
char tmp[1];
sprintf(tmp, "\x%x", 50); // version 1
sprintf(tmp, "\\x%x", 50); // version 2
Version 1 and 2 don't work.
I have given a number, for example n = 10, and I want to calculate its length in hex
Repeatedly divide by 16 to find the number of hexadecimal digits. A do ... while insures the result is 1 when n==0.
int hex_length = 0;
do {
hex_length++;
} while (number /= 16);
save it in a 8 byte char pointer.
C cannot force your system to use 8-byte pointer. So if you system uses 4 byte char pointer, we are out of luck. Let us assume OP's system uses 8-byte pointer. Yet integers may be assigned to pointers. This may or may not result in valid pointer.
assert(sizeof (char*) == 8);
char *char_pointer = n;
printf("%p\n", (void *) char_pointer);
In this example I would like to get the following string: "\x00\x00\x00\x00\x00\x00\x00\x50".
In C, a string includes the various characters up to an including a null character. "\x00\x00\x00\x00\x00\x00\x00\x50" is not a valid C string, yet is a valid string literal. Code cannot construct string literals at run time, that is a part of source code. Further the relationship between n==10 and "\x00...\x00\x50" is unclear. Instead perhaps the goal is to store n into a 8-byte array (big endian).
char buf[8];
for (int i=8; i>=0; i--) {
buf[i] = (char) n;
n /= 256;
}
OP's code certainly will fail as it attempts to store a string which is too small. Further "\x%x" is not valid code as \x begins an invalid escape sequence.
char tmp[1];
sprintf(tmp, "\x%x", 50); // version 1
Just do:
int i;
...
int length = round(ceil(log(i) / log(16)));
This will give you (in length) the number of hexadecimal digits needed to represent i (without 0x of course).
log(i) / log(base) is the log-base of i. The log16 of i gives you the exponent.
To make clear what we're doing here: When rising 16 to the power of the found exponent, we get back i: 16^log16(i) = i.
By rounding up this exponent using ceil(), you get the number of digits.

declaring string using pointer to int

I am trying to initialize a string using pointer to int
#include <stdio.h>
int main()
{
int *ptr = "AAAA";
printf("%d\n",ptr[0]);
return 0;
}
the result of this code is 1094795585
could any body explain this behavior and why the code gave this answers ?
I am trying to initialize a string using pointer to int
The string literal "AAAA" is of type char[5], that is array of five elements of type char.
When you assign:
int *ptr = "AAAA";
you actually must use explicit cast (as types don't match):
int *ptr = (int *) "AAAA";
But, still it's potentially invalid, as int and char objects may have different alignment requirements. In other words:
alignof(char) != alignof(int)
may hold. Also, in this line:
printf("%d\n", ptr[0]);
you are invoking undefined behavior (so it might print "Hello from Mars" if compiler likes so), as ptr[0] dereferences ptr, thus violating strict aliasing rule.
Note that it is valid to make transition int * ---> char * and read object as char *, but not the opposite.
the result of this code is 1094795585
The result makes sense, but for that, you need to rewrite your program in valid form. It might look as:
#include <stdio.h>
#include <string.h>
union StringInt {
char s[sizeof("AAAA")];
int n[1];
};
int main(void)
{
union StringInt si;
strcpy(si.s, "AAAA");
printf("%d\n", si.n[0]);
return 0;
}
To decipher it, you need to make some assumptions, depending on your implementation. For instance, if
int type takes four bytes (i.e. sizeof(int) == 4)
CPU has little-endian byte ordering (though it's not really matter, since every letter is the same)
default character set is ASCII (the letter 'A' is represented as 0x41, that is 65 in decimal)
implementation uses two's complement representation of signed integers
then, you may deduce, that si.n[0] holds in memory:
0x41 0x41 0x41 0x41
that is in binary:
01000001 ...
The sign (most-significant) bit is unset, hence it is just equal to:
65 * 2^24 + 65 * 2^16 + 65 * 2^8 + 65 =
65 * (2^24 + 2^16 + 2^8 + 1) = 65 * 16843009 = 1094795585
1094795585 is correct.
'A' has the ASCII value 65, i.e. 0x41 in hexadecimal.
Four of them makes 0x41414141 which is equal to 1094795585 in decimal.
You got the value 65656565 by doing 65*100^0 + 65*100^1 + 65*100^2 + 65*100^3 but that's wrong since a byte1 can contain 256 different values, not 100.
So the correct calculation would be 65*256^0 + 65*256^1 + 65*256^2 + 65*256^3, which gives 1094795585.
It's easier to think of memory in hexadecimal because one hexadecimal digit directly corresponds to half a byte1, so two hex digits is one full byte1 (cf. 0x41). Whereas in decimal, 255 fits in a single byte1, but 256 does not.
1 assuming CHAR_BIT == 8
65656565 this is a wrong representation of the value of "AAAA" you are seprately representing each character and "AAAA" is stored as array.Its converting into 1094795585 because %d identifier prints decimal value. Run this in gdb with following command:
x/8xb (pointer) //this will show you the memory hex value
x/d (pointer) //this will show you the converted decimal value
#zenith gave you the answer you expected, but your code invokes UB. Anyway, you could demonstrate the same in an almost correct way :
#include <stdio.h>
int main()
{
int i, val;
char *pt = (char *) &val; // cast a pointer to any to a pointer to char : valid
for (i=0; i<sizeof(int); i++) pt[i] = 'A'; // assigning bytes of int : UB in general case
printf("%d 0x%x\n",val, val);
return 0;
}
Assigning bytes of an int is UB in the general case because C standard says that [for] signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. And a remark adds Some combinations of padding bits might generate trap representations, for example, if one padding
bit is a parity bit.
But in common architectures, there are no padding bits and all bits values correspond to valid numbers, so the operation is valid (but implementation dependant) on all common systems. It is still implementation dependant because size of int is not fixed by standard, nor is endianness.
So : on a 32 bit system using no padding bits, above code will produce
1094795585 0x41414141
indepentantly of endianness.

Char shifting in C

I am trying to find the function in the library that shifts chars back and forward as I want
for instance:
if this function consumes 'a' and a number to shift forward 3 , it will be shifted 3 times and the output will be 'd'.
if it this function consumes '5' and a number to shift forward 3 , it will be shifted 3 times and the output will be '8'.
how can I achieve this?
You don't need to call a function to do this. Just add the number to the character directly.
For example:
'a' + 3
evaluates to
'd'
Given what you've asked for, this does that:
char char_shift(char c, int n) {
return (char)(c + n);
}
If you meant something else (perhaps intending that 'Z' + 1 = 'A'), then rewrite your question...
In C, a char is an integer type (like int, and long long int).
It functions just like the other integer types, except the range of values it can store is typically limited to -128 to 127, or 0 to 255, although this depends on implementation.
For example:
char x = 3;
char y = 6;
int z;
z = x + y;
printf("z = %d\n", z); //prints z = 9
The char type (usually as part of an array) is most often used to store text, where each character is encoded as a number.
Character and string constants are a convenience. If we assume the machine uses the ASCII character set (which is almost ubiquitous today), in which case capital A is encoded as 65, then:
char x = 'A';
char str[] = "AAA";
is equivalent to
char x = 65;
char str[] = {65, 65, 65, 0};
Therefore, something like 'X' + 6 makes perfect sense - what the result will be depends on the character encoding. In ASCII, it's equivalent to 88 + 6 which is 94 which is '^'.

Resources