How integer pointer to char pointer conversion works?
I have a program that has integer value 320 and I'm typecasting into char*. It will show the output 64. I want to know how its works?
#include <stdio.h>
int main()
{
int i=320;
char *p=(char*)&i;
printf("%d",*p);
return 0;
}
Well, on your little-endian system, let's assume sizeof (int) is 4.
Then the memory for i looks like:
+--+-+-+-+
i: |64|1|0|0|
+--+-+-+-+
This is because 320 is 0x00000140, i.e. 320 = 1 * 256 + 64.
So you set p to point at the first byte (64), and then dereference it so that single byte is read.
Your final line is wrong, you meant:
printf("%d\n", *p);
Quoting C11, chapter §6.3.2.3, emphasis mine
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is
undefined. Otherwise, when converted back again, the result shall compare equal to the
original pointer. When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
So, the binary representation would look like (little-endian architecture assumed, based on output you presented)
00000001 01000000
^^ ^^
HAB LAB HAB- High Address Byte, LAB - Low Address Byte
And, by the cast, you are essentially pointing to
01000000
Part. So the dereference will produce that value as the integer result, (01000000)2 == (64)10.
Note: Only a character type pointer is capable of aliasing any other pointer type. Don;t try it with other target types which are not compatible with the source type.
The different value is due to truncation; it also depends on the endian-ness of the platform The value 640, if stored in an int of say 16 byte, has the following binary pattern.
0000 0001 0100 0000
If a pointer to these two bytes is cast to a pointer of char, it will refer to the lower byte, which is as follows.
0100 0000
However, this bit pattern has a numerical value of 64, which is the output of the program.
Related
I have following snippet where value of pointer is printed, since it is pointer, on 64 bit machine, its size is 8 bytes, and 64 bits should be used to represent the address, But:
#include <stdio.h>
int main()
{
char *s = "";
printf("%p %p\n", s, (char*)(int)s);
return 0;
}
But output is :
0x4005e4 0x4005e4
why only 24 bits are used for the pointer value, shouldn't this be 64 bits ?
Also, is it UB if cast of different size pointer are involved like here (char *)(int)s ?
What I was expecting with this (char)s to give only 1 Byte address but it is printing 8 bytes address?
why only 24 bits are used for the pointer value
Your pointers happen to have their most significant bits set to zero, so they aren't printed. If you really want to print all 64 bits, you can change your printf format string to make it print leading zeros.
is it UB if cast of different size pointer are involved like here (char *)(int)s ?
There are no "different size pointers" on the machines you're likely to be using, but int is commonly 32 bits. So by casting through int on the way to char*, you are throwing away the most significant 32 bits. If they were zero, you may not notice the difference, but if not you'll corrupt the pointer and nobody knows what you'll get if you dereference it. You can still print its value (i.e. the address it points to, even if it's nonsense).
What is the reinterpret_cast of (char) doing here?
unsigned int aNumber = 258; // 4 bytes in allocated memory [02][01][00][00]
printf("\n is printing out the first byte %02i",(char)aNumber); // Outputs the first byte[02]
Why am i getting out the first byte without pointing to it? such as (char*)&aNumber
is the %02i doing this = (char)*&aNumber
or is the reinterpret_cast of (char) cutting out the rest 3 bytes since it is a char it only allocate one byte of them 4 bytes?
First, reinterpret_cast is a C++ operator. What you've shown is not that but a C-style cast.
The cast is converting a value of type unsigned int to a value of type char. Conversion of an out-of-range value is implementation defined, but in most implementations you're likely to come across, this is implemented as reinterpreting the lower order bytes as the converted value.
In this particular case, the low order byte of aNumber has the value 0x02, so that's what the result is when casted to a char.
Say I have a bigger type.
uint32_t big = 0x01234567;
Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?
Is that an undefined behavior to shift the address of (char*)&big to (char*&big)+1, (char*&big)+2, etc.?
Is that an undefined behavior to both shift and edit (char*)&big+1? Like the example below. I think this example should be an undefined behavior because after casting to (char*), we then have limited our eyesight to a char-type pointer, and we ought not access, even change the value outside this scope.
uint32_t big = 0x01234567;
*((char*)&big + 1) = 0xff;
printf("%02x\n\n\n", *((char*)&big+1));
printf("%02x\n\n\n", big);
(This pass my Visual C++ compiler. By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?)
I have seen a code like this. And this is what I usually do when I need to achieve similar task. Is this UB or not? Why or why not? What is the standard way to achieve this?
uint8_t catcher[8] = { 0 };
uint64_t big = 0x1234567812345678;
memcpy(catcher, (uint8_t*)&big, sizeof(uint64_t));
Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?
If a char is eight bits, which it is in most modern C implementations, then there are four bytes in the uint32_t big, and you can do arithmetic on the address from (char *) &big + 0 to (char *) &big + 4. You can also read and write the bytes from (char *) &big + 0 to (char *) &big + 3, and those will access individual bytes in the representation of big. Although arithmetic is defined to work up to (char *) &big + 4, that is only an endpoint. There is no defined byte there, and you should not use that address to read or write anything.
Is that an undefined behavior to shift the address of (char*)&big to (char*&big)+1, (char*&big)+2, etc.?
These are additions, not shifts, and the syntax is (char *) &big + 1, not (char*&big)+1. Arithmetic is defined for the offsets from +0 to +4.
Is that an undefined behavior to both shift and edit (char*)&big+1?
It is allowed to read and write the bytes in big using a pointer to char. This is a special rule for character types. Generally, the bytes of an object should not be accessed using an unrelated type. For example, a float object could not be accessed using an int type. However, the character types are special; you may access the bytes of any object using a character type.
However, it is preferable to use unsigned char for this, as it avoids complications with signed values.
I have seen a code like this.
It is allowed to read or write the bytes of an object using memcpy. memcpy is defined to work as if by copying characters.
Note that, while accessing the bytes of an object is defined by the C standard, how bytes represent values is partly implementation-defined. Different C implementations may use different orders for the bytes within an object, and there can be other differences.
By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?
In your C implementation, char is signed and can represent values from −128 to +127. In *((char*)&big + 1) = 0xff;, 0xff is 255 and is too big to fit into a char. It is converted to a char value in an implementation-defined way. Your C implementation converts it to −1. (The eight-bit two’s complement representation of −1, bits 11111111, uses the same bits as the binary representation of 255, again bits 11111111.)
Then printf("%02x\n\n\n", *((char*)&big+1)); passes this value, −1, to printf. Since it is a char, it is promoted to int to be passed to printf. This produces the same value, −1, but it has 32 bits, 11111111111111111111111111111111. Then you are passing an int, but printf expects an unsigned int for %02x. The behavior of this is not defined by the C standard, but your C implementation reads the 32 bits as if they were an unsigned int. As an unsigned int, the 32 bits 11111111111111111111111111111111 represent the value 4,294,967,295 or 0xffffffff, so that is what printf prints.
You can print the correct value by using printf("%02hhx\n\n\n", * ((unsigned char *) &big + 1));. As an unsigned char, the bits 11111111 represent 255 or 0xff, and converting that to an int produces 255 or 0x000000ff.
For variadic functions (like printf) all arguments undergoes default argument promotion which promotes smaller integer types to int.
This conversion will include sign-extension if the smaller type is signed, so the value keeps its value.
So if char is a signed type (which is implementation defined) with a value of -1 then it will be promoted to the int value -1. Which is what you see.
If you want to print a smaller type you need to first of all cast to the correct type (unsigned char) then use the proper format (like %hhx for printing unsigned char values).
As I understand it, addressing a single bit in an int variable seems possible, if it is passed as a pointer. Am I correct?
uint8_t adcState[8];
uint16_t adcMessageId[8];
void adcEnable(uint8_t* data) {
// Set ADC Input as enabled
adcState[(uint8_t) data[0]] = 1;
// Get ADC Message ID
adcMessageId[(uint8_t) data[0]] = data[2] << 8 | data[1];
}
So far this is what I figured out that:
The function receives a pointer to 8bit int as an argument
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read), and uses it as a field number for adcState array, which then is set to 1. For example this would mean if data was 729, the data[0] would be '9' and therefore the adcsState[9] becomes 1.
Is it possible? Can you use the pointers like this?
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of the data int.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000. Therefore an OR with data[1] would be just data[1] itself...
In our example, the adcMessageId[9] would become ('7' << 8) bitwise OR with '2', so just '2'.
Something in my logic seems wrong.
It would seem data is pointing to an array, not a single 8 bit int, and that:
The first element of the array is a pointer into the arrays adcState and adcMessageId
The second and third elements of the array comprise a data value for the array adcMessageId
As commenter #Eugene Sh. pointed out, data[2] is promoted to an int before shifting, so no bits are lost.
The pointer notation uint8_t * is as valid as array notation uint8_t [] in a function signature for passing an array; it's often how char * strings are passed, and arrays decay to a pointer to their first element when passed to functions anyway.
The function receives a pointer to 8bit int as an argument
Yes, roughly. And the function implementation assumes that the pointed-to uint8_t can be treated as the first element of an array of at least three uint8_t.
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read), and uses it
as a field number for adcState array, which then is set to 1. For
example this would mean if data was 729, the data[0] would be '9' and
therefore the adcsState[9] becomes '1'. Is it possible? Can you use
the pointers like this?
No, you have completely misunderstood. data[0] means exactly the same thing as *data. It refers to the 8-bit unsigned integer to which data points. The number 729 is too large to be represented as a uint8_t, but if the object to which data pointed had the value 129 then data[0] would evaluate to 129.
You are perhaps confused by the appearance later in the function of data[1] and data[2]. These refer to two uint8_t objects following *data in memory, such as will reliably be present if data points to the first element of an array, as I alluded to above. Indexing with [] does not have the effect of slicing the uint8_t to which data points.
Pay attention also that I am saying "the object to which data points". One does not pass a uint8_t value directly as this function's parameter. It is not anticipating that an integer value would be reinterpreted as a pointer. You pass a pointer to the data you want the function to work with.
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of
the data int.
In the adcMessageId case, again data[0] refers to the uint8_t to which data points. data[1] refers to another whole uint8_t following that in memory, and data[2] to the next after that.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000.
uint8_t has only 8 bits, but all integer values narrower than int are converted to an integer type at least as wide as int, or perhaps to a floating-point type, for arithmetic operations. The specific promoted type depends in part on what the other operand is, and the result has the same, promoted type. Type int is at least 16 bits wide on all conforming C implementations. Thus this ...
data[2] << 8 | data[1]
... intends to pack the two uint8_t values data[2] and data[1] into one 16-bit integer, data[2] in the most-significant position. It's not entirely safe because the elements of data will be promoted to (signed) int instead of unsigned int, but that will present an issue only on implementations where int is 16 bits wide (which are uncommon these days), and even then, only if the value of data[2] is larger than 127. A safer way to express it would involve explicit casts:
(unsigned int) data[2] << 8 | (unsigned int) data[1]
You have a few misconceptions. Or maybe just wrong wording.
The function receives a pointer to 8bit int as an argument
More precisely it gets a pointer to an array of 8bit integers. Otherwise your usage would be invalid. Probably it gets a pointer to a string.
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read),
That is wrong. You seem to use it as a pointer to a string holding a number.
In that case you access the first character, which is the MOST significant decimal digit.
and uses it as a field number for adcState array, which then is set to 1. For example this would mean if data was 729, the data[0] would be '9' and therefore the adcsState[9] becomes '1'. Is it possible? Can you use the pointers like this?
You are messing up things a bit.
If you want to access decimal digits, we are talking about strings and there the first element is '7' which is not to be confused with 7.
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of the data int.
Maybe you should not talk about int if you are using strings.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000. Therefore an OR with data[1] would be just data[1] itself... In our example, the adcMessageId[9] would become ('7' << 8) bitwise OR with '2', so just '2'.
That was already addressed in comments and Govind Parmar's answer: Integer promotion takes place before shifting.
I am trying to understand the binary representation for '\0' vs the binary representation of int 0 in C
if '\0' is a char. 1 byte of zeros in binary is: 00000000 ?
if 0 is an int. 4 bytes of zeros in binary is: 00000000 00000000 00000000 00000000?
1. is my above understanding correct?
2. if so, how is the first byte of an int not confused with a null terminator?
Note: I understand NULL is a pointer in C. I am not referring to pointers here.
is my above understanding [of the representation of various int and char values] correct?
Pretty much. The char value represented in C source code as (char)'\0' has numeric value zero and an all-zeroes representation in memory. That representation is 8 bits in size on all modern machines, but C does not mandate that particular size, and some historic machines indeed did use different sizes.
Note also that as #mch observed, without any cast, the expression '\0' has type int. It can be converted to type char without an explicit cast, as indeed happens when, for example, you assign it to a variable of that type.
if so, how is the first byte of an int not confused with a null terminator?
If your program is in a position to interpret the first byte of an int as if it were the representation of a char then its behavior is undefined. Otherwise, it knows that the first byte is part of an int because that's what it assigned to that location. There is no inherent distinction between the representation of the char value (char)'\0' and a suitably-aligned sequence of the same number of zero bits as part of the representation of an object of another type.
1. is my above understanding correct?
yes
2. if so, how is the first byte of an int not confused with a null terminator?
There is nothing to be confused about.
If the variable itself is int then sizeof(int) bytes are fetched, and if all are 0s, then the value is 0.
If the variable is char then sizeof(char), bytes are fetched and if all are 0s then it is interpreted as \0 NUL character.
They cannot be distinguished. One is zero in int format, the other is zero in char format. If you write to a char array[] with
array[0] = 0;
the result is exactly the same as
array[0] = '\0';