Why does the following print what it does? - c

typedef unsigned char byte;
unsigned int nines = 999;
byte * ptr = (byte *) &nines;
printf ("%x\n",nines);
printf ("%x\n",nines * 0x10);
printf ("%d\n",ptr[0]);
printf ("%d\n",ptr[1]);
printf ("%d\n",ptr[2]);
printf ("%d\n",ptr[3]);
Output:
3e7
3e70
231
3
0
0
I know the first two are just hexadecimal representations of 999 and 999*16. What do the remaining 4 mean? the ptr[0] to ptr[3]?

Most likely you are running this on a 32 bit LE system 999 in hex is:-
00 00 03 E7 - The way it would be stored in memory would be
E7 03 00 00 Hence:-
ptr[0] points to the byte containing E7 which is 231 in decimal
ptr[1] points to the byte containing 03 which is 3 in decimal
ptr[2] points to the byte containing 00 which is 0 in decimal
ptr[3] points to the byte containing 00 which is 0 in decimal
HTH!

I think that you will see clearly if you write:
typedef unsigned char byte;
main() {
unsigned int nines = 999;
byte * ptr = (byte *) &nines;
printf ("%x\n",nines);
printf ("%x\n",nines * 0x10);
printf ("%x\n",ptr[0]);
printf ("%x\n",ptr[1]);
printf ("%x\n",ptr[2]);
printf ("%x\n",ptr[3]);
printf ("%d\n",sizeof(unsigned int));
}
char is 8 bits, one byte, and int is 4 bytes (in my 64 bytes machine).
In your machine the data is saved as little-endian so less significative byte is located first.

Related

What type casting address, when assigning to pointer, does?

I'm working on understanding C and I came across this example:
int main()
{
double d = 010;
unsigned *pi = (unsigned *) &d;
printf("%x", *++pi);
}
So d has value of 8 in decimal numerical system, 0 10000000010 000... in binary and 40200000 in hexadecimal. I don't understand what this typecast of address does. Why doesn't *pi get hexadecimal value, but instead it prints 0? When it's address is incremented and then dereferenced it get's right hexadecimal value. Why?
The code actually has undefined behavior because of the aliasing rule: you are accessing the representation of the double variable with a pointer to another type that is not a character type.
The intent is to print the 32-bits from the bytes 4 to 7 of the double variable, which are all 0 because of the way 8.0 is represented on your architecture.
The cast tells the compiler to convert the pointer to another type of pointer. This usually does not change its value, but dereferencing the resulting pointer may produce undefined behavior, except for character types.
Here is a modified version that is portable and prints the byte values:
#include <stdio.h>
int main() {
double d = 010;
unsigned char *p = (unsigned char*)&d;
int n = sizeof(double);
while (n --> 0) {
printf("%02x ", *p++);
}
printf("\n");
return 0;
}
Output on an Intel Mac (little endian): 00 00 00 00 00 00 20 40
Output on a big-endian system: 40 20 00 00 00 00 00 00

int to hex conversion not going proper for high values 225943 is being converted into 0x000372ffffff97

My C program takes a random int high value and convert it into hex and write it to a file. Everything goes well if the value is 225919 or less
eg. 225875 is 00 03 72 53
but if the value is above 225919 it starts writing extra ffffff for last byte in the hex value example 885943 is 00 03 72 ffffff97, while the right value would have been 00 03 72 97.
Code that writes the value into file is as follows:
char *temp = NULL;
int cze = 225943;
temp = (char *)(&cze);
for (ii = 3; ii >= 0; ii--) {
printf(" %02x ", temp[ii]); //just for printing the values
fprintf(in, "%02x", temp[ii]);
}
Output is: 00 03 72 ffffff97
Expected output: 00 03 72 97
Please help, any pointer is appreciated.
Your temp array contains char values, which in most cases means signed char. The values are then being printed as signed chars, so any byte greater than 0x7f is considered a negative value. When that value is passed to printf, it is implicitly converted to int. This adds one or more bytes containing all 1 bits if the number is negative.
Change the datatype to unsigned char. This will cause the implicit promotion to change to unsigned int and you'll get the correct values.
unsigned char *temp=NULL;
int cze=225943;
temp=(unsigned char *)(&cze);
for(ii=3;ii>=0;ii--){
printf(" %02x ",temp[ii] );//just for printing the values
fprintf(in,"%02x",temp[ii]);
}
Alternately, you can use the hh length modifier in printf, which tells it that the argument is a char or unsigned char. This will restrict it to printing 1 byte's worth of data.
printf(" %02hhx ",temp[ii] );

Need clarification about unsigned char * in C

Given the code:
...
int x = 123
...
unsigned char * xx = (char *) & x;
...
I have xx[0] = 123, xx[1] = 0, xx[2] = 0, etc.
Can someone explain what is happening here? I dont have a great understanding of pointers in general, so the simpler the better.
Thanks
You're accessing the bytes (chars) of a little-endian int in sequence. The number 123 in an int on a little-endian system will usually be stored as {123,0,0,0}. If your number had been 783 (256 * 3 + 15), it would be stored as {15,3,0,0}.
I'll try to explain all the pieces in ASCII pictures.
int x = 123;
Here, x is the symbol representing a location of type int. Type int uses 4 bytes of memory on a 32-bit machine, or 8 bytes on a 64-bit machine. This can be compiler dependent as well. But for this discussion, let's assume 32-bits (4 bytes).
Memory on x86 is managed "little endian", meaning if a number requires multiple bytes (it's value is > 255 unsigned, or > 127 signed, single byte values), then the number is stored with the least significant byte in the lowest address. If your number were hexadecimal, 0x12345678, then it would be stored as:
x: 78 <-- address that `x` represents
56 <-- x addr + 1 byte
34 <-- x addr + 2 bytes
12 <-- x addr + 3 bytes
Your number, decimal 123, is 7B hex, or 0000007B (all 4 bytes shown), so would look like:
x: 7B <-- address that `x` represents
00 <-- x addr + 1 byte
00 <-- x addr + 2 bytes
00 <-- x addr + 3 bytes
To make this clearer, let's make up a memory address for x, say, 0x00001000. Then the byte locations would have the following values:
Address Value
x: 00001000 7B
00001001 00
00001002 00
00001003 00
Now you have:
unsigned char * xx = (char *) & x;
Which defines a pointer to an unsigned char (an 8-bit, or 1-byte unsigned value, ranging 0-255) whose value is the address of your integer x. In other words, the value contained at location xx is 0x00001000.
xx: 00
10
00
00
The ampersand (&) indicates you want the address of x. And, technically, the declaration isn't correct. It really should be cast properly as:
unsigned char * xx = (unsigned char *) & x;
So now you have a pointer, or address, stored in the variable xx. That address points to x:
Address Value
x: 00001000 7B <-- xx points HERE (xx has the value 0x00001000)
00001001 00
00001002 00
00001003 00
The value of xx[0] is what xx points to offset by 0 bytes. It's offset by bytes because the type of xx is a pointer to an unsigned char which is one byte. Therefore, each offset count from xx is by the size of that type. The value of xx[1] is just one byte higher in memory, which is the value 00. And so on. Pictorially:
Address Value
x: 00001000 7B <-- xx[0], or the value at `xx` + 0
00001001 00 <-- xx[1], or the value at `xx` + 1
00001002 00 <-- xx[2], or the value at `xx` + 2
00001003 00 <-- xx[3], or the value at `xx` + 3
Yeah, you're doing something you shouldn't be doing...
That said... One part of the result is you're working on a little Endian processor. The int x = 123; statement allocates 4 bytes on the stack and intializes it with the value 123; Since it is little Endian, the memory looks like 123, 0, 0, 0 in memory. If it was big Endian, it would be 0, 0, 0, 123. Your char pointer is pointing to the first byte of memory where x is stored.
unsigned char * xx = (char *) & x;
You take the address of x, you tell the compiler it is a pointer to a character[string], you assign that to xx, which is a pointer to a character[string]. The cast to (char *) just keeps the compiler happy.
Now if you print xx, or inspect it, it can depend on the machine what you see - the so-called little-endian ot big-endian way of storing integers. X86 is little endian and stores the bytes of the integer in reverse. So storing 0x00000123 will store 0x23 0x01 0x00 0x00, which is what you see when inspecting the location xx points to as characters.

How can one address can store more than one value?

Question is given in title:
I dont know why is this happening.
Can someone tell me how such tricks works.
Here is my code:
#include<stdio.h>
int main(){
int a = 320;
char *ptr;
printf("%p\n",&a);
ptr =( char *)&a;
printf("%p\n",ptr);
printf("%d\n",a);
printf("%d\n",*ptr);
return 0;
}
Output:
0x7fffc068708c
0x7fffc068708c
320
64
There is only one value stored.
The second printf takes the first char's worth of data at that address, promotes it to int, and prints the result. The first prints the whole int.
(320 == 256 + 64, or 0x140 == 0x01 0x40)
The actual data at 0x7fffc068708c is 0x00000140.
That's 320 in decimal.
But if you access it via ptr =( char *)&a;, then you only get 0x40.
That's 64 in decimal.
Simple, really: using a char pointer, you get rid of any extra bit of data above a byte:
a = 320
0x 00 00 00 00 01 40
| a | -> 0x 00000140 = 320
|ptr| -> 0x 40 = 64
You "see" two values because you don't use all the precision available to you.
You would have "seen" one value if you had used a short instead of a char, but really, it's just how you interpret the data.
The point is while assingning a to ptr you saying it is a pointer to a character and not a integer. Change that and try

memory layout - C union

I have a union type of array of three integers (4 bytes each), a float (4 bytes), a double (8 bytes) and a character (1 byte).
if I assign 0x31313131 to each of the three integer elements and then printed the union's character, I will get the number 1. Why ?
I don't understand the output I know that the bits of 3 0x31313131 is
001100010011000100110001001100010011000100110001001100010011000100110001001100010011000100110001
Because '1' == 0x31. You are printing it as character, not integer.
since it is a union all the int and char share the same memory location (the float and double does not matter in this context). So assigning 0x31313131 to the int does affect the char value -- nothing much confusing there.
Every member of a union has the same starting address; different members may have different sizes. The size of the union as a whole is at least the maximum size of any member; there may be extra padding at the end for alignment requirements.
You store the value 0x31313131 in the first three int-sized memory areas of your union object. 0x31313131 is 4 bytes, each of which has the value 0x31.
You then read the first byte (from offset 0) by accessing the character member. That byte has the value 0x31, which happens to be the encoding for the character '1' in ASCII and similar character sets. (If you ran your program on an EBCDIC-based system, you'd see different results.)
Since you haven't shown us any actual source code, I will, based on your description:
#include <stdio.h>
#include <string.h>
void hex_dump(char *name, void *base, size_t size) {
unsigned char *arr = base;
char c = ' ';
printf("%-8s : ", name);
for (size_t i = 0; i < size; i ++) {
printf("%02x", arr[i]);
if (i < size - 1) {
putchar(' ');
}
else {
putchar('\n');
}
}
}
int main(void) {
union u {
int arr[3];
float f;
double d;
char c;
};
union u obj;
memset(&obj, 0xff, sizeof obj);
obj.arr[0] = 0x31323334;
obj.arr[1] = 0x35363738;
obj.arr[2] = 0x393a3b3c;
hex_dump("obj", &obj, sizeof obj);
hex_dump("obj.arr", &obj.arr, sizeof obj.arr);
hex_dump("obj.f", &obj.f, sizeof obj.f);
hex_dump("obj.d", &obj.d, sizeof obj.d);
hex_dump("obj.c", &obj.c, sizeof obj.c);
printf("obj.c = %d = 0x%x = '%c'\n",
(int)obj.c, (unsigned)obj.c, obj.c);
return 0;
}
The hex_dump function dumps the raw representation of any object, regardless of its type, by showing the value of each byte in hexadecimal.
I first fill the union object with 0xff bytes. Then, as you describe, I initialize each element of the int[3] member arr -- but to show more clearly what's going on, I use different values for each byte.
The output I get on one system (which happens to be little-endian) is:
obj : 34 33 32 31 38 37 36 35 3c 3b 3a 39 ff ff ff ff
obj.arr : 34 33 32 31 38 37 36 35 3c 3b 3a 39
obj.f : 34 33 32 31
obj.d : 34 33 32 31 38 37 36 35
obj.c : 34
obj.c = 52 = 0x34 = '4'
As you can see, the initial bytes of each member are consistent with each other, because they're stored in the same place. The trailing ff bytes are unaffected by assigning values to arr (this is not the only valid behavior; the standard says they take unspecified values). Because the system is little-endian, the high-order byte of each int value is stored at the lowest position in memory.
The output on a big-endian system is:
obj : 31 32 33 34 35 36 37 38 39 3a 3b 3c ff ff ff ff
obj.arr : 31 32 33 34 35 36 37 38 39 3a 3b 3c
obj.f : 31 32 33 34
obj.d : 31 32 33 34 35 36 37 38
obj.c : 31
obj.c = 49 = 0x31 = '1'
As you can see, the high-order byte of each int is at the lowest position in memory.
In all cases, the value of obj.c is the first byte of obj.arr[0] -- which will be either the high-order or the low-order byte, depending on endianness.
There are a lot of ways this can vary across different systems. The sizes of int, float, and double can vary. The way floating-point numbers are represented can vary (though this example doesn't show that). Even the number of bits in a byte can vary; it's at least 8, but it can be bigger. (It's exactly 8 on any system you're likely to encounter). And the standard allows padding bits in integer representations; there are none in the examples I've shown.

Resources