char* p = (char*) malloc(8 * sizeof(char));
strcpy(p, "fungus");
for (int i = 0; i < strlen(p); i++) {
printf("%c", p[i]);
}
char* p2 = (char*) malloc(4 * sizeof(char));
*p2 = *(uint16_t*) p;
printf("\nPointer: %c\n", p2[1]);
So I created a char* p to store the string "fungus".
To my understanding, if I typecast p to a uint16_t* and dereference it, the returned value should be the first 2 bytes pointed to by p. So, p2[1] should be "u", since "u" is the second byte in the string "fungus". However, this isn't the case when I run my program. I've also tried to print p[0] + 1, but this just outputs "g". Is there an error in my logic?
To my understanding, if I typecast p to a uint16_t* and dereference it, the returned value should be the first 2 bytes pointed to by p.
This is not correct, for at least two reasons.
One, a pointer to a char might not have the alignment required for a uint16_t, and then the conversion to uint16_t is not defined by the C standard, per C 2018 6.3.2.3 7:
… If the resulting pointer is not correctly aligned70) for the referenced type, the behavior is undefined…
That will not apply in the example code in the question, since the char * is assigned from malloc, and malloc always returns an address suitably aligned for any fundamental type. However, it may apply to char * created in other ways (including by adding offsets to p, such as attempting to convert p+1 to uint16_t *).
Two, accessing objects defined as char (including those created by writing char values to memory allocated with malloc) as if they were uint16_t violates the aliasing rules in C 2018 6.5 7. It is possible a C implementation might reinterpret the two char as a uint16_t, but it is also possible that optimization by the compiler might transform the undefined behavior into something else.
*p2 = *(uint16_t*) p;
This code is an assignment to the single char *p2, which is p2[0], regardless of the fact that the right-hand side may be a 16-bit value. It does not touch p2[1].
If *(uint16_t*) p; does reinterpret two eight-bit bytes of the string as a uint16_t, then it will produce some 16-bit value that is then assigned to the single char *p2. If char is unsigned, this will store the low eight bits of that value as p2[0], leaving p2[1] untouched. If it is signed, an implementation-defined conversion will be performed, and the result (if a trap does not occur) will be assigned to p2[0], again leaving p2[1] untouched.
Then printf("\nPointer: %c\n", p2[1]); attempts to print a value that has not been initialized, since nothing has put a value in p2[1].
You could try changing *p2 = *(uint16_t*) p; to * (uint16_t *) p2 = * (uint16_t *) p; to copy the uint16_t whole, instead of trying to cram it into a single byte.
Related
#include <stdio.h>
int main(){
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
cp++; // This still moves 8 bytes
return 0;
}
Since the size of a pointer is 64 bits on 64 bit machines, doing a pp++ will always move 8 bytes. Is there a way to make it move only 1 byte?
Is there a way to make it move only 1 byte?
Maybe.
All object pointers can be converted to void * and since char * has the same representation, to char *. ++ increments a char * by 1.
#include <stdio.h>
int main() {
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
char *character_pointer = (char *) cp;
character_pointer++; // Increment by 1
Now is the tricky part. Can that incremented pointer convert back to a char **. C allows that unless
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr § 6.3.2.2 7
cp = (char **) character_pointer;
return 0;
}
Reading *cp can readily cause undefined behavior as cp does not certainly point to a valid char *. Unclear as to OP's goal at this point.
C is not assembly. What you are trying to do is undefined behavior, and compiler might not do what you ask, and the program might do anything, including possibly what you think it should do if C were just "assembly" with different syntax.
That being said, you can do this:
int a = 5;
int *p = &a;
int **pp = &p;
uintptr_t temp;
memcpy(&temp, &pp, sizeof temp);
temp++;
memcpy(&pp, &temp, sizeof temp);
Above code is likely to do what you want, even though that last memcpy already triggers undefined behavior, because it copies invalid value to a pointer (that is enough for it to be UB). Actually using pp, which now has invalid value, has increasing chance of messing things up.
To understand why having any UB is indeed UB: compiler is free to decide that the effect of the code, which can be proven to have UB, is nothing, or is never reached. So if that last memcpy is inside if, and compiler can prove UB occurs if condition is true, it may just assume condition is never true and optimize whole if away. Presumably C programmer knows to write their condition so that it would never result in UB, so this optimization can be made at compile time already.
Yeah, it is a bit crazy. C is not just assembly with different syntax!
Incrementing pointer to pointer by one byte
If you find an implementation where the size of a pointer to pointer variable contains only 8 bits, (i.e. one that uses 1 byte addressing, btw, very unlikely), then it will be doable, and only then would it be safe to do so. Otherwise it would not be considered a practical or safe thing to do.
For an implementation that uses 64 bit addressing, 64 bits are needed to represent each natural pointer location. Note however though _[t]he smallest incremental change is [available as a by-product of] the alignment needs of the referenced type. For performance, this often matches the width of the ref type, yet systems can allow less._ (per #Chux in comments) but de-referencing these locations could, and likely would lead to undefined behavior.
And in this statement
char **cp = (char **)pp; //where pp is defined as int **
the cast, although allowing a compile without complaining, is simply masking a problem. With the exception of void *, pointer variables are created using the same base type of the object they are to point to for the reason that the sizeof different types can be different, so the pointers designed to point to a particular type can represent its locations accurately.
It is also important to note the following:
sizeof char ** == sizeof char * == sizeof char *** !!= sizeof char`
32bit 4 bytes 4 bytes 4 bytes 1 byte
64bit 8 bytes 8 bytes 8 bytes 1 byte
sizeof int ** == sizeof int * == sizeof int *** !!= sizeof int`
32bit 4 bytes 4 bytes 4 bytes 4 bytes (typically)
64bit 8 bytes 8 bytes 8 bytes 4 bytes (typically)
So, unlike the type of a pointer, its size has little to do with it's ability to point to a location containing an object that is smaller, or even larger in size than the pointer used to point to it.
The purpose of a pointer ( eg char * ) is to store an address to an object of the same base type, in this case char. If targeting 32bit addressing, then the size of the pointer indicates it can point to 4,294,967,296 different locations (or if 64 bits to 18,446,744,073,709,551,616 locations.) and because in this case it is designed to point to char, each address differs by one byte.
But this really has nothing to do with your observation that when you increment a pointer to pointer to char that you see 8 bytes, and not 1 byte. It simply has to do with the fact that pointers, in 64bit addressing, require 8 bytes of space, thus the successive printf statements below will always show an increment of 8 bytes between the 1st and 2nd calls:
char **cp = (char **)pp;
size_t size = sizeof(cp);
printf("address of cp before increment: %p\n", cp);
cp++; // This still moves 8 bytes
printf("address of cp after increment: %p\n", cp);
return 0;
If I have a pointer to an array of char*s, in other words, a char** named p1, what would I get if I do (char*)p1? I’m guessing there would be some loss of precision. What information would I lose, and what would p1 now be pointing to? Thanks!
If you had asked about converting an int ** to an int *, the answer would be different. Let’s consider that first, because I suspect it is more representative of the question you intended to ask, and the char * case is more complicated because there is a special purpose involved in that.
Suppose you have several int: int a, b, c, d;. You can make an array of pointers to them: int *p[] = { &a, &b, &c, &d };. You can also make a pointer to one of these pointers: int **q = &p[1];. Now q points to p[1], which contains the address of b.
When you write *q, the compiler knows q points to a pointer to an int, so it knows *q points to an int. If you write **q, the compiler, knowing that *q points to an int, will get *q from memory and use that as an address to get an int.
What happens if you convert q to an int * and try to use it, as in printf("%d\n", * (int *) q);? When you convert q to an int * you are (falsely) telling the compiler to treat it as a pointer to an int. Then, * (int *) q tells the compiler to go to that address and get an int.
This is invalid code—its behavior is not defined by the C standard. Specifically, it violates C 2018 6.5 7, which says that an object shall be accessed only by an lvalue expression that has a correct type—either a type compatible with that of the actual object or certain other cases, none of which apply here. At the place q points, there is a pointer to an int, but you tried to access it as if it were an int, and that is not allowed.
Now let’s consider the char ** to char * case. As before, you might take some char **q and convert it to char *. Now, you are telling the compiler to go to the place q points, where there is a pointer to a char, and to access that memory location as if there were a char there.
C has special rules for this case. You are allowed to examine the bytes that make up objects by accessing them through a char *. So, if you convert a char ** to char * and use it, as in * (char *) q, the result will be the first (lowest addressed) byte that makes up the pointer there. You can even look at the rest of the bytes, using code like this:
char *t = (char *) q;
printf("%d\n", t[0]);
printf("%d\n", t[1]);
printf("%d\n", t[2]);
printf("%d\n", t[3]);
This code will show you the decimal values for the first four bytes that make up the pointer to char that is at the location specified by q.
In summary, converting a char ** to char * will allow you to examine the bytes that represent a char *. However, in general, you should not convert pointers of one indirection level to pointers of another indirection level.
I find pointers make much more sense when I think of them as memory addresses rather than some abstract high level thing.
So a char** is a memory address which points to, a memory address which points to, a character.
0x0020 -> 0x0010 -> 0x0041 'A'
When you cast you change the interpretation of the data, not the actual data. So the char* is
0x0020 -> 0x0010 (unprintable)
This is almost certainly not useful. Interpreting this random data as a null terminated string would be potentially disastrous.
I have written
int a;
printf("addr = %p and content = %x\n", (void*)(&a), *(&a));
printf("addr = %p and content = %x\n", (void*)(&a)+1, *(&a)+1);
What I see in the output is
addr = 0x7fffffffde3a and content = 55554810
addr = 0x7fffffffde3b and content = 5555
I expect to see one byte in each address. However, I don't see such thing. Why?
First of all, pointer arithmetic and the dereference operator honors the data type.
Remember, a pointer arithmetic, which generates a pointer one past the last element of an array is valid, but attempt to dereference the generated pointer is undefined behavior.
Attempt to dereference a pointer which points to invalid memory location is undefined behavior.
That said,
Quoting C11,
The unary * operator denotes indirection. [...] If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’.
So, in your case, *(&a) is the same as a, which is of type int, and the format specifier prints the integer value stored in a.
If you want to see the byte-by-byte value, you need to cast the pointer (address of a) to a char * and then, dereference the pointer to see value stored in each byte.
So, (void*)(&a)+1 should be changed to (char*)(&a)+1 to point to the next byte of memory.
If you want to print bytes, then play with bytes:
use unsigned char* or uint8_t* for pointers
use %hhx to tell printf that input value is character length.
Example:
int a = 0x12345678;
printf("addr = %p and content = %hhx\n", (void*)&a, *(uint8_t*)&a);
printf("addr = %p and content = %hhx\n", ((void*)&a)+1, *((uint8_t*)&a+1));
Result in my little-endian env:
addr = 0xbedbacac and content = 78
addr = 0xbedbacad and content = 56
And don't forget to use *((uint8_t*)&a+1) instead of *(uint8_t*)&a+1. In the example the later would return 79 (78+1).
You've used the formatting directive %x. %x expects an unsigned int. The type of what you pass as the corresponding argument to printf does not change that. printf will read an unsigned int values from its arguments, no matter what you pass to it. If what you pass something that isn't compatible with what printf reads, the behavior of your program is undefined.
You happen to pass an int. An int is not an unsigned int, but it's close enough that when you read an int expecting an unsigned int, the value remains the same if the integer is positive or zero. Negative integers are mapped to a large positive value (UINT_MAX - a on machines using two's complement representation, which is almost all machines).
If you pass a char (signed or not) to printf when it expects an int (signed or not), the behavior is well-defined due to another feature of the C language, which is promotions. Values of integer types that are smaller than int (i.e. char and short) are converted to int¹. So the following program snippet is well-defined and prints the value of a byte at the address p (assuming that … is replaced by a valid pointer value):
unsigned char p = …;
printf("addr = %p and content = %x\n", (void*)p, *p);
*(&a) is the same thing as a. To see just one byte of a, you could cast the pointer &a to the type unsigned char *.
printf("addr = %p and content = %x\n", (void*)&a, *(unsigned char *)(&a));
This will print one byte of the representation of a in memory. Note that the representation depends on the machine's endianness.
Your code snippet doesn't initialize a, so the first call to printf prints whatever garbage happens to be at the location of a in memory at this time. This assumes that int does not have any trap representation, which is the case on virtually all C implementations.
The second call to printf tries to print *(&a)+1. This is just a+1. The output you get is surprising: are you sure you didn't actually run the program with *(&a+1)? This seems to be what you wanted to explore. With *(&a+1), the behavior of your program would be undefined, because this looks one int past a, and a is not in an array of two or more ints. In practice, you'd be likely to get whatever was just below a on the stack, but that's not something you can count on.
If you wanted to see the byte value at an address which is 1 past the start of a in memory, you'd need to cast the pointer to a byte pointer first. When you add an integer n to a pointer, this doesn't add n to the address stored in the pointer, it adds n to the pointer itself. This is only useful when the pointer points to a value inside an array; then p + n points to n array elements past p. In fact, p[n] is equivalent to *(p+n). If you want to add 1 to the address, then you need to obtain a byte pointer, i.e. a pointer to unsigned char. Contrast:
int a[2] = {0x12345678, 0x9abcdef0};
printf("addr = %p and content = %x\n", (unsigned char*)(&a) + 1, *((unsigned char*)(&a) + 1));
printf("addr = %p and content = %x\n", (&a) + 1, *((&a) + 1));
This is well-defined (but with an implementation-defined value since it depends on the platform's endianness) provided that int consists of at least two bytes (which is not strictly mandatory in C, but is the case everywhere except in a few embedded systems).
(void*)(&a) + 1 is not standard C, because void doesn't have a size so it doesn't make sense to move a pointer to void one void element further. However some implementations such as GCC treat void* as byte pointers, just like unsigned char *, so adding an integer to a void* adds this integer to the address stored in the pointer.
¹ Or to unsigned int if the smaller type doesn't fit in a signed int, e.g. on a platform where short and int have the same size.
I've got moderately stuck, googling the right words can't got me to the right answer. Even worse, I've already done that but my own code example lost somewhere in the source code.
#include <stdio.h>
int main()
{
short x = 0xABCD;
char y[2] = { 0xAB, 0xCD };
printf("%x %x\n", y[0], y[1]);
printf("%x %x\n", (char *)&x[0], (char *)&x[1]);
}
Basically I need to access individual variable bytes via array by pointer arithmetic, without any calculations, just by type casting.
Put parentheses around your cast:
printf("%x %x\n", ((char *)&x)[0], ((char *)&x)[1]);
Note that endian-ness may change your expected result.
In the future, compile with -Wall to see what the warnings or errors are.
It's somewhat supported in C99. By a process known as type punning via union.
union {
short s;
char c[2];
} pun;
pun.s = 0xABCD;
pun.c[0] // reinterprets the representation of pun.s as char[2].
// And accesses the first byte.
Pointer casting (as long as it's to char*, to avoid strict aliasing violations) is also ok.
short x = 0xABCD;
char *c = (char*)&x;
If you're only bothered about getting the values, you can store the address of the source variable in a char * and increment and dereference the char pointer to print the values of each byte.
Quoting C11, chapter §6.3.2.3
[....] When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
Something like (consider pseudo-code, not tested)
#include <stdio.h>
int main(void)
{
int src = 0x12345678;
char * t = &src;
for (int i = 0; i < sizeof(src); i++)
printf("%x\t", t[i]);
return 0;
}
should do it.
That said, to elaborate on the accepted answer, the why part:
As per the operator precedence table, array indexing operator has higher precedence over the type-casting, so unless forced explicitly, in the expression
(char *)&x[0]
the type of x is not changed as expected. So, to enforce the meaningful usage of the type-casting, we need to enclose it into extra par of parenthesis.
I've read several posts about casting int pointers to char pointers but i'm still confused on one thing.
I understand that integers take up four bytes of memory (on most 32 bit machines?) and characters take up on byte of memory. By casting a integer pointer to a char pointer, will they both contain the same address? Does the cast operation change the value of what the char pointer points to? ie, it only points to the first 8 bits of an integers and not all 32 bits ? I'm confused as to what actually changes when I cast an int pointer to char pointer.
By casting a integer pointer to a char pointer, will they both contain the same address?
Both pointers would point to the same location in memory.
Does the cast operation change the value of what the char pointer points to?
No, it changes the default interpretation of what the pointer points to.
When you read from an int pointer in an expression *myIntPtr you get back the content of the location interpreted as a multi-byte value of type int. When you read from a char pointer in an expression *myCharPtr, you get back the content of the location interpreted as a single-byte value of type char.
Another consequence of casting a pointer is in pointer arithmetic. When you have two int pointers pointing into the same array, subtracting one from the other produces the difference in ints, for example
int a[20] = {0};
int *p = &a[3];
int *q = &a[13];
ptrdiff_t diff1 = q - p; // This is 10
If you cast p and q to char, you would get the distance in terms of chars, not in terms of ints:
char *x = (char*)p;
char *y = (char*)q;
ptrdiff_t diff2 = y - x; // This is 10 times sizeof(int)
Demo.
The int pointer points to a list of integers in memory. They may be 16, 32, or possibly 64 bits, and they may be big-endian or little endian. By casting the pointer to a char pointer, you reinterpret those bits as characters. So, assuming 16 bit big-endian ints, if we point to an array of two integers, 0x4142 0x4300, the pointer is reinterpreted as pointing to the string "abc" (0x41 is 'a', and the last byte is nul). However if integers are little endian, the same data would be reinterpreted as the string "ba".
Now for practical purposes you are unlikely to want to reinterpret integers as ascii strings. However its often useful to reinterpret as unsigned chars, and thus just a stream of raw bytes.
Casting a pointer just changes how it is interpreted; no change to its value or the data it points to occurs. Using it may change the data it points to, just as using the original may change the data it points to; how it changes that data may differ (which is likely the point of doing the casting in the first place).
A pointer is a particular variable that stores the memory address where another variable begins. Doesnt matter if the variable is a int or a char, if the first bit has the same position in the memory, then a pointer to that variable will look the same.
the difference is when you operate on that pointer. If your pointer variable is p and it's a int pointer, then p++ will increase the address that it contains of 4 bytes.
if your pointer is p and it's a char pointer, then p++ will increase the address that it contains of 1 byte.
this code example will help you understand:
int main(){
int* pi;
int i;
char* pc;
char c;
pi = &i;
pc = &c;
printf("%p\n", pi); // 0x7fff5f72c984
pi++;
printf("%p\n", pi); // 0x7fff5f72c988
printf("%p\n", pc); // 0x7fff5f72c977
pc++;
printf("%p\n", pc); // 0x7fff5f72c978
}