Need clarification about unsigned char * in C - c

Given the code:
...
int x = 123
...
unsigned char * xx = (char *) & x;
...
I have xx[0] = 123, xx[1] = 0, xx[2] = 0, etc.
Can someone explain what is happening here? I dont have a great understanding of pointers in general, so the simpler the better.
Thanks

You're accessing the bytes (chars) of a little-endian int in sequence. The number 123 in an int on a little-endian system will usually be stored as {123,0,0,0}. If your number had been 783 (256 * 3 + 15), it would be stored as {15,3,0,0}.

I'll try to explain all the pieces in ASCII pictures.
int x = 123;
Here, x is the symbol representing a location of type int. Type int uses 4 bytes of memory on a 32-bit machine, or 8 bytes on a 64-bit machine. This can be compiler dependent as well. But for this discussion, let's assume 32-bits (4 bytes).
Memory on x86 is managed "little endian", meaning if a number requires multiple bytes (it's value is > 255 unsigned, or > 127 signed, single byte values), then the number is stored with the least significant byte in the lowest address. If your number were hexadecimal, 0x12345678, then it would be stored as:
x: 78 <-- address that `x` represents
56 <-- x addr + 1 byte
34 <-- x addr + 2 bytes
12 <-- x addr + 3 bytes
Your number, decimal 123, is 7B hex, or 0000007B (all 4 bytes shown), so would look like:
x: 7B <-- address that `x` represents
00 <-- x addr + 1 byte
00 <-- x addr + 2 bytes
00 <-- x addr + 3 bytes
To make this clearer, let's make up a memory address for x, say, 0x00001000. Then the byte locations would have the following values:
Address Value
x: 00001000 7B
00001001 00
00001002 00
00001003 00
Now you have:
unsigned char * xx = (char *) & x;
Which defines a pointer to an unsigned char (an 8-bit, or 1-byte unsigned value, ranging 0-255) whose value is the address of your integer x. In other words, the value contained at location xx is 0x00001000.
xx: 00
10
00
00
The ampersand (&) indicates you want the address of x. And, technically, the declaration isn't correct. It really should be cast properly as:
unsigned char * xx = (unsigned char *) & x;
So now you have a pointer, or address, stored in the variable xx. That address points to x:
Address Value
x: 00001000 7B <-- xx points HERE (xx has the value 0x00001000)
00001001 00
00001002 00
00001003 00
The value of xx[0] is what xx points to offset by 0 bytes. It's offset by bytes because the type of xx is a pointer to an unsigned char which is one byte. Therefore, each offset count from xx is by the size of that type. The value of xx[1] is just one byte higher in memory, which is the value 00. And so on. Pictorially:
Address Value
x: 00001000 7B <-- xx[0], or the value at `xx` + 0
00001001 00 <-- xx[1], or the value at `xx` + 1
00001002 00 <-- xx[2], or the value at `xx` + 2
00001003 00 <-- xx[3], or the value at `xx` + 3

Yeah, you're doing something you shouldn't be doing...
That said... One part of the result is you're working on a little Endian processor. The int x = 123; statement allocates 4 bytes on the stack and intializes it with the value 123; Since it is little Endian, the memory looks like 123, 0, 0, 0 in memory. If it was big Endian, it would be 0, 0, 0, 123. Your char pointer is pointing to the first byte of memory where x is stored.

unsigned char * xx = (char *) & x;
You take the address of x, you tell the compiler it is a pointer to a character[string], you assign that to xx, which is a pointer to a character[string]. The cast to (char *) just keeps the compiler happy.
Now if you print xx, or inspect it, it can depend on the machine what you see - the so-called little-endian ot big-endian way of storing integers. X86 is little endian and stores the bytes of the integer in reverse. So storing 0x00000123 will store 0x23 0x01 0x00 0x00, which is what you see when inspecting the location xx points to as characters.

Related

How are ints stored in C

I've been trying to understand how data is stored in C but I'm getting confused. I have this code:
int main(){
int a;
char *x;
x = (char *) &a;
x[0] = 0;
x[1] = 3;
printf("%d\n", a);
return 0;
}
I've been messing around with x[0] & x[1], trying to figure out how they work, but I just can't. For example x[1] = 3 outputs 768. Why?
I understand that there are 4 bytes (each holding 8 bits) in an int, and x[1] points to the 2nd byte. But I don't understand how making that second byte equal to 3, means a = 768.
I can visualise this in binary format:
byte 1: 00000000
byte 2: 00000011
byte 3: 00000000
byte 4: 00000000
But where does the 3 come into play? how does doing byte 2 = 3, make it 00000011 or 768.
Additional question: If I was asked to store 545 in memory. What would a[0] and a[1] = ?
I know the layout in binary is:
byte 1: 00100001
byte 2: 00000010
byte 3: 00000000
byte 4: 00000000
It is not specific to C, it is how your computer is storing the data.
There are two different methods called endianess.
Little-endian: the least significant byte is stored first.
Example: 0x11223344 will be stored as 0x44 0x33 0x22 0x11
Big-endian: the least significant byte is stored last.
Example: 0x11223344 will be stored as 0x11 0x22 0x33 0x44
Most modern computers use the little-endian system.
Additional question: If I was asked to store 545 in memory
545 in hex is 0x221 so the first byte will be 0x21 and the second one 0x02 as your computer is little-endian.
Why do I use hex numbers? Because every two digits represent exactly one byte in memory.
I've been messing around with x[0] & x[1], trying to figure out how
they work, but I just can't. For example x[1] = 3 outputs 768. Why?
768 in hex is 0x300. So the byte representation is 0x00 0x03 0x00 0x00
Warning: by casting the address of an int to a char *, the compiler is defenseless trying to maintain order. Casting is the programmer telling the compiler "I know what I am doing." Use it will care.
Another way to refer to the same region of memory in two different modes is to use a union. Here the compiler will allocate the space required that is addressable as either an int or an array of signed char.
This might be a simpler way to experiment with setting/clearing certain bits as you come to understand how the architecture of your computer stores multi-byte datatypes.
See other responses for hints about "endian-ness".
#include <stdio.h>
int main( void ) {
union {
int i;
char c[4];
} x;
x.i = 0;
x.c[1] = 3;
printf( "%02x %02x %02x %02x %08x %d\n", x.c[0], x.c[1], x.c[2], x.c[3], x.i, x.i );
x.i = 545;
printf( "%02x %02x %02x %02x %08x %d\n", x.c[0], x.c[1], x.c[2], x.c[3], x.i, x.i );
return 0;
}
00 03 00 00 00000300 768
21 02 00 00 00000221 545

Why does the following print what it does?

typedef unsigned char byte;
unsigned int nines = 999;
byte * ptr = (byte *) &nines;
printf ("%x\n",nines);
printf ("%x\n",nines * 0x10);
printf ("%d\n",ptr[0]);
printf ("%d\n",ptr[1]);
printf ("%d\n",ptr[2]);
printf ("%d\n",ptr[3]);
Output:
3e7
3e70
231
3
0
0
I know the first two are just hexadecimal representations of 999 and 999*16. What do the remaining 4 mean? the ptr[0] to ptr[3]?
Most likely you are running this on a 32 bit LE system 999 in hex is:-
00 00 03 E7 - The way it would be stored in memory would be
E7 03 00 00 Hence:-
ptr[0] points to the byte containing E7 which is 231 in decimal
ptr[1] points to the byte containing 03 which is 3 in decimal
ptr[2] points to the byte containing 00 which is 0 in decimal
ptr[3] points to the byte containing 00 which is 0 in decimal
HTH!
I think that you will see clearly if you write:
typedef unsigned char byte;
main() {
unsigned int nines = 999;
byte * ptr = (byte *) &nines;
printf ("%x\n",nines);
printf ("%x\n",nines * 0x10);
printf ("%x\n",ptr[0]);
printf ("%x\n",ptr[1]);
printf ("%x\n",ptr[2]);
printf ("%x\n",ptr[3]);
printf ("%d\n",sizeof(unsigned int));
}
char is 8 bits, one byte, and int is 4 bytes (in my 64 bytes machine).
In your machine the data is saved as little-endian so less significative byte is located first.

c pointers and memory representation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I came across this question on SO:
(Tricky pointer question):
A C programmer is working with a little-endian machine with 8 bits in a byte and 4 bytes
in a word. The compiler supports unaligned access and uses 1, 2 and 4 bytes to store char, short and int respectively. The programmer writes the following definitions (below right) to access values in main memory (below left):
Address | Byte offset
--------|-0--1--2--3----
0x04 | 10 00 00 00
0x08 | 61 72 62 33
0x0c | 33 00 00 00
0x10 | 78 0c 00 00
0x14 | 08 00 00 00
0x18 | 01 00 4c 03
0x1c | 18 00 00 00
int **i=(int **)0x04;
short **pps=(short **)0x1c;
struct i2c {
int i;
char *c;
}*p=(struct i2c*)0x10;
(a) Write down the values for the following C expressions:
**i
p->c[2]
&(*pps)[1]
++p->i
That question only answers the third subquestion but I was wondering how the rest of the subquestions would be solved. I'm new to C and trying to improve my understanding on pointers and this was particularly confusing. Thanks for any help!
You'll probably want to refer to this: Operators in C and C++ - Operator precedence
The question didn't specify, but we will assume that pointers are 4 bytes - sizeof(void*) == 4.
1.
int **i=(int **)0x04;
**i = ???
i is a pointer to a (pointer to int). When we say **i, we are de-referencing the pointer twice - or reading the value that the pointer points to.
First note that **i == *(*i). So we first read a pointer-sized (4-byte) value from memory at address 0x04. This is 10 00 00 00 - interpreted as a little-endian value, that is 0x10. So now we are left with *((int*)0x10). That means we read an int-sized (4-byte) value from memory at address 0x10. 78 0c 00 00 interpretted in little-endian is the value 0xC78.
2.
struct i2c {
int i;
char *c;
} *p = (struct i2c*)0x10;
p->c[2] = ???
This one's a little trickier. I'll assume that you understand that structures are just a collection of variables that (excluding padding, which doesn't apply here) are laid out one after another in memory.
Our pointer p points to a struct i2c object at 0x10 in memory. That means at address 0x10 is the int, named p->i. And immediately following that, at address 0x14, is the char *, named p->c.
The expression p->c[2] means: "First get the char *c from the structure that p points to. Then, get the char at index 2 from the array that p->c points to."
So first, we'll get p->c. I already mentioned this char* is at address 0x14. There we find the pointer 08 00 00 00, or 0x8.
Now, we have a char * that points to address 0x8, and we want the char at index 2 in that array. To get the address of an array element, we use this formula:
&(x[y]) == (char*)x + (y * sizeof(x[0]))
In other words, the offset (from the start of the array) of the nth element in the array is n times the size of each element.
Since chars are 1 byte, p->c[2] is at 0x8 + 2 = 0xA. There we find the value 0x62, which is the ASCII character 'b'.
3.
short **pps=(short **)0x1c;
&(*pps)[1] = ???
With our knowledge operator precedence, we read &(*pps)[1] as "First dereference pps, which is a pointer-to-short (or an array of shorts). Then, we want the address of the element at index 1."
At address 0x1C we have 18 00 00 00, or 0x18. So now we have a pointer to an array of shorts, and this array starts at address 0x18. Using our formula from above, and knowing that shorts are 2 bytes in size, we calculate element 1 to be at address 0x18 + (1 * 2) == 0x1A.
At address 0x1A is 4c 03, or 0x034C. However, the problem wasn't asking us for the value of element 1 - that would be solving (*pps)[1]. Instead, it asked for &(*pps)[1] or the address of that element. So, we simply go back to the end of the previous paragraph, where we said the address was 0x1A.
4.
++p->i = ??
For this one, you really need to know the operator precedence. It should be clear that this could be interpreted two different ways:
a) Increment the pointer p 1, and then dereference p to get its member i
b) Dereference p to get its member i, then increment the integer value
From the precedence chart, we see that -> has a precedence of 2, while ++ (the Prefix increment), has a lower precedence of 3. That means we need to apply the -> first, then increment. Thus, option b) was correct.
So, first let's get p->i. We already said in part 2. that since p points to address 0x10, and i is the first member in struct i2c, p->i is at address 0x10. There we find 78 0c 00 00, or 0xC78.
Finally, we need to apply the ++ operator, and increment that value to 0xC79.
.....
1 - Pointer arithmetic means you treat a pointer like an array. So p + 3 doesn't mean "p plus 3 bytes", it means &p[3], or "p plus (3 * sizeof(*p)) bytes".

How can one address can store more than one value?

Question is given in title:
I dont know why is this happening.
Can someone tell me how such tricks works.
Here is my code:
#include<stdio.h>
int main(){
int a = 320;
char *ptr;
printf("%p\n",&a);
ptr =( char *)&a;
printf("%p\n",ptr);
printf("%d\n",a);
printf("%d\n",*ptr);
return 0;
}
Output:
0x7fffc068708c
0x7fffc068708c
320
64
There is only one value stored.
The second printf takes the first char's worth of data at that address, promotes it to int, and prints the result. The first prints the whole int.
(320 == 256 + 64, or 0x140 == 0x01 0x40)
The actual data at 0x7fffc068708c is 0x00000140.
That's 320 in decimal.
But if you access it via ptr =( char *)&a;, then you only get 0x40.
That's 64 in decimal.
Simple, really: using a char pointer, you get rid of any extra bit of data above a byte:
a = 320
0x 00 00 00 00 01 40
| a | -> 0x 00000140 = 320
|ptr| -> 0x 40 = 64
You "see" two values because you don't use all the precision available to you.
You would have "seen" one value if you had used a short instead of a char, but really, it's just how you interpret the data.
The point is while assingning a to ptr you saying it is a pointer to a character and not a integer. Change that and try

Tricky pointer question

I'm having trouble with a past exam question on pointers in c which I found from this link,
http://www.cl.cam.ac.uk/teaching/exams/pastpapers/y2007p3q4.pdf
The question is this:
A C programmer is working with a
little-endian machine with 8 bits in a
byte and 4 bytes in a word. The
compiler supports unaligned access and
uses 1, 2 and 4 bytes to store char,
short and int respectively. The
programmer writes the following
definitions (below right) to access
values in main memory (below left):
Address Byte offset
---------0 --1-- 2-- 3
0x04 | 10 00 00 00
0x08 | 61 72 62 33
0x0c | 33 00 00 00
0x10 | 78 0c 00 00
0x14 | 08 00 00 00
0x18 | 01 00 4c 03
0x1c | 18 00 00 00
int **i=(int **)0x04;
short **pps=(short **)0x1c;
struct i2c {
int i;
char *c;
}*p=(struct i2c*)0x10;
(a) Write down the values for the following C expressions:
**i
p->c[2]
&(*pps)[1]
++p->i
I get
**i == 0xc78
p->c[2] == '62'
++p->i == 0x1000000
I don't understand the third question (&(*pps)[1]), could someone please explain what is going on here? I understand the pps pointer has been dereferenced but then the address of operator has been applied to the value. Isn't that just like asking for the adress of a constant, for example if I did this
int i = 7;
int *p = &i;
&(*p) //would this mean address of 7??
Thanks in advance for any help.
The [] operator takes precedence over the & operator. So the code is dereferencing pps to get to the first element of an array of short*. Since this element is also a pointer, we may treat it as an array and look up the element one position to the right of what it points to, wth [1]. Finally, we take the address of that element.
It might be useful to note that &p[i] is the same as p + i - it gives you a pointer to the element i positions to the right of where p points to.
The intermediate values are:
pps == 0x1c
*pps == 0x18
&(*pps)[1] == *pps + 1 == 0x1A
(the +1 adds two bytes, since it is used on a short*)
The expression is parsed as &((*pps)[1]); pps is being treated as a pointer to an array, you're accessing the first element of that pointed-to array, and then taking the address of that element.
pps is a pointer to pointer to short,
which means that *pps is a pointer to short (or array of shorts),
(*pps)[1] is just like *(*pps + 1) [pointers arithmetic],
and &(*(*pps + 1)) is the address of *(*pps+1),
or, in other words - (*pps+1) (which is a pointer to short).
pps is a pointer to a pointer.
It is dereferencing pps. So now you have a pointer. As arrays are just pointers you are then using pps as an array.
It is then same as:
short ps[2] = {0x0001,0x034c};
short **pps = &ps;
so the result is: 0x034c

Resources