c programming question on reinterpret_cast - c

What is the reinterpret_cast of (char) doing here?
unsigned int aNumber = 258; // 4 bytes in allocated memory [02][01][00][00]
printf("\n is printing out the first byte %02i",(char)aNumber); // Outputs the first byte[02]
Why am i getting out the first byte without pointing to it? such as (char*)&aNumber
is the %02i doing this = (char)*&aNumber
or is the reinterpret_cast of (char) cutting out the rest 3 bytes since it is a char it only allocate one byte of them 4 bytes?

First, reinterpret_cast is a C++ operator. What you've shown is not that but a C-style cast.
The cast is converting a value of type unsigned int to a value of type char. Conversion of an out-of-range value is implementation defined, but in most implementations you're likely to come across, this is implemented as reinterpreting the lower order bytes as the converted value.
In this particular case, the low order byte of aNumber has the value 0x02, so that's what the result is when casted to a char.

Related

What is forbidden after pointer-casting a big type to a smaller type in C

Say I have a bigger type.
uint32_t big = 0x01234567;
Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?
Is that an undefined behavior to shift the address of (char*)&big to (char*&big)+1, (char*&big)+2, etc.?
Is that an undefined behavior to both shift and edit (char*)&big+1? Like the example below. I think this example should be an undefined behavior because after casting to (char*), we then have limited our eyesight to a char-type pointer, and we ought not access, even change the value outside this scope.
uint32_t big = 0x01234567;
*((char*)&big + 1) = 0xff;
printf("%02x\n\n\n", *((char*)&big+1));
printf("%02x\n\n\n", big);
(This pass my Visual C++ compiler. By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?)
I have seen a code like this. And this is what I usually do when I need to achieve similar task. Is this UB or not? Why or why not? What is the standard way to achieve this?
uint8_t catcher[8] = { 0 };
uint64_t big = 0x1234567812345678;
memcpy(catcher, (uint8_t*)&big, sizeof(uint64_t));
Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?
If a char is eight bits, which it is in most modern C implementations, then there are four bytes in the uint32_t big, and you can do arithmetic on the address from (char *) &big + 0 to (char *) &big + 4. You can also read and write the bytes from (char *) &big + 0 to (char *) &big + 3, and those will access individual bytes in the representation of big. Although arithmetic is defined to work up to (char *) &big + 4, that is only an endpoint. There is no defined byte there, and you should not use that address to read or write anything.
Is that an undefined behavior to shift the address of (char*)&big to (char*&big)+1, (char*&big)+2, etc.?
These are additions, not shifts, and the syntax is (char *) &big + 1, not (char*&big)+1. Arithmetic is defined for the offsets from +0 to +4.
Is that an undefined behavior to both shift and edit (char*)&big+1?
It is allowed to read and write the bytes in big using a pointer to char. This is a special rule for character types. Generally, the bytes of an object should not be accessed using an unrelated type. For example, a float object could not be accessed using an int type. However, the character types are special; you may access the bytes of any object using a character type.
However, it is preferable to use unsigned char for this, as it avoids complications with signed values.
I have seen a code like this.
It is allowed to read or write the bytes of an object using memcpy. memcpy is defined to work as if by copying characters.
Note that, while accessing the bytes of an object is defined by the C standard, how bytes represent values is partly implementation-defined. Different C implementations may use different orders for the bytes within an object, and there can be other differences.
By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?
In your C implementation, char is signed and can represent values from −128 to +127. In *((char*)&big + 1) = 0xff;, 0xff is 255 and is too big to fit into a char. It is converted to a char value in an implementation-defined way. Your C implementation converts it to −1. (The eight-bit two’s complement representation of −1, bits 11111111, uses the same bits as the binary representation of 255, again bits 11111111.)
Then printf("%02x\n\n\n", *((char*)&big+1)); passes this value, −1, to printf. Since it is a char, it is promoted to int to be passed to printf. This produces the same value, −1, but it has 32 bits, 11111111111111111111111111111111. Then you are passing an int, but printf expects an unsigned int for %02x. The behavior of this is not defined by the C standard, but your C implementation reads the 32 bits as if they were an unsigned int. As an unsigned int, the 32 bits 11111111111111111111111111111111 represent the value 4,294,967,295 or 0xffffffff, so that is what printf prints.
You can print the correct value by using printf("%02hhx\n\n\n", * ((unsigned char *) &big + 1));. As an unsigned char, the bits 11111111 represent 255 or 0xff, and converting that to an int produces 255 or 0x000000ff.
For variadic functions (like printf) all arguments undergoes default argument promotion which promotes smaller integer types to int.
This conversion will include sign-extension if the smaller type is signed, so the value keeps its value.
So if char is a signed type (which is implementation defined) with a value of -1 then it will be promoted to the int value -1. Which is what you see.
If you want to print a smaller type you need to first of all cast to the correct type (unsigned char) then use the proper format (like %hhx for printing unsigned char values).

can you address single bits of an int?

As I understand it, addressing a single bit in an int variable seems possible, if it is passed as a pointer. Am I correct?
uint8_t adcState[8];
uint16_t adcMessageId[8];
void adcEnable(uint8_t* data) {
// Set ADC Input as enabled
adcState[(uint8_t) data[0]] = 1;
// Get ADC Message ID
adcMessageId[(uint8_t) data[0]] = data[2] << 8 | data[1];
}
So far this is what I figured out that:
The function receives a pointer to 8bit int as an argument
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read), and uses it as a field number for adcState array, which then is set to 1. For example this would mean if data was 729, the data[0] would be '9' and therefore the adcsState[9] becomes 1.
Is it possible? Can you use the pointers like this?
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of the data int.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000. Therefore an OR with data[1] would be just data[1] itself...
In our example, the adcMessageId[9] would become ('7' << 8) bitwise OR with '2', so just '2'.
Something in my logic seems wrong.
It would seem data is pointing to an array, not a single 8 bit int, and that:
The first element of the array is a pointer into the arrays adcState and adcMessageId
The second and third elements of the array comprise a data value for the array adcMessageId
As commenter #Eugene Sh. pointed out, data[2] is promoted to an int before shifting, so no bits are lost.
The pointer notation uint8_t * is as valid as array notation uint8_t [] in a function signature for passing an array; it's often how char * strings are passed, and arrays decay to a pointer to their first element when passed to functions anyway.
The function receives a pointer to 8bit int as an argument
Yes, roughly. And the function implementation assumes that the pointed-to uint8_t can be treated as the first element of an array of at least three uint8_t.
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read), and uses it
as a field number for adcState array, which then is set to 1. For
example this would mean if data was 729, the data[0] would be '9' and
therefore the adcsState[9] becomes '1'. Is it possible? Can you use
the pointers like this?
No, you have completely misunderstood. data[0] means exactly the same thing as *data. It refers to the 8-bit unsigned integer to which data points. The number 729 is too large to be represented as a uint8_t, but if the object to which data pointed had the value 129 then data[0] would evaluate to 129.
You are perhaps confused by the appearance later in the function of data[1] and data[2]. These refer to two uint8_t objects following *data in memory, such as will reliably be present if data points to the first element of an array, as I alluded to above. Indexing with [] does not have the effect of slicing the uint8_t to which data points.
Pay attention also that I am saying "the object to which data points". One does not pass a uint8_t value directly as this function's parameter. It is not anticipating that an integer value would be reinterpreted as a pointer. You pass a pointer to the data you want the function to work with.
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of
the data int.
In the adcMessageId case, again data[0] refers to the uint8_t to which data points. data[1] refers to another whole uint8_t following that in memory, and data[2] to the next after that.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000.
uint8_t has only 8 bits, but all integer values narrower than int are converted to an integer type at least as wide as int, or perhaps to a floating-point type, for arithmetic operations. The specific promoted type depends in part on what the other operand is, and the result has the same, promoted type. Type int is at least 16 bits wide on all conforming C implementations. Thus this ...
data[2] << 8 | data[1]
... intends to pack the two uint8_t values data[2] and data[1] into one 16-bit integer, data[2] in the most-significant position. It's not entirely safe because the elements of data will be promoted to (signed) int instead of unsigned int, but that will present an issue only on implementations where int is 16 bits wide (which are uncommon these days), and even then, only if the value of data[2] is larger than 127. A safer way to express it would involve explicit casts:
(unsigned int) data[2] << 8 | (unsigned int) data[1]
You have a few misconceptions. Or maybe just wrong wording.
The function receives a pointer to 8bit int as an argument
More precisely it gets a pointer to an array of 8bit integers. Otherwise your usage would be invalid. Probably it gets a pointer to a string.
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read),
That is wrong. You seem to use it as a pointer to a string holding a number.
In that case you access the first character, which is the MOST significant decimal digit.
and uses it as a field number for adcState array, which then is set to 1. For example this would mean if data was 729, the data[0] would be '9' and therefore the adcsState[9] becomes '1'. Is it possible? Can you use the pointers like this?
You are messing up things a bit.
If you want to access decimal digits, we are talking about strings and there the first element is '7' which is not to be confused with 7.
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of the data int.
Maybe you should not talk about int if you are using strings.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000. Therefore an OR with data[1] would be just data[1] itself... In our example, the adcMessageId[9] would become ('7' << 8) bitwise OR with '2', so just '2'.
That was already addressed in comments and Govind Parmar's answer: Integer promotion takes place before shifting.

How does a computer distinguish the bits of a null terminator from an integer of 0?

I am trying to understand the binary representation for '\0' vs the binary representation of int 0 in C
if '\0' is a char. 1 byte of zeros in binary is: 00000000 ?
if 0 is an int. 4 bytes of zeros in binary is: 00000000 00000000 00000000 00000000?
1. is my above understanding correct?
2. if so, how is the first byte of an int not confused with a null terminator?
Note: I understand NULL is a pointer in C. I am not referring to pointers here.
is my above understanding [of the representation of various int and char values] correct?
Pretty much. The char value represented in C source code as (char)'\0' has numeric value zero and an all-zeroes representation in memory. That representation is 8 bits in size on all modern machines, but C does not mandate that particular size, and some historic machines indeed did use different sizes.
Note also that as #mch observed, without any cast, the expression '\0' has type int. It can be converted to type char without an explicit cast, as indeed happens when, for example, you assign it to a variable of that type.
if so, how is the first byte of an int not confused with a null terminator?
If your program is in a position to interpret the first byte of an int as if it were the representation of a char then its behavior is undefined. Otherwise, it knows that the first byte is part of an int because that's what it assigned to that location. There is no inherent distinction between the representation of the char value (char)'\0' and a suitably-aligned sequence of the same number of zero bits as part of the representation of an object of another type.
1. is my above understanding correct?
yes
2. if so, how is the first byte of an int not confused with a null terminator?
There is nothing to be confused about.
If the variable itself is int then sizeof(int) bytes are fetched, and if all are 0s, then the value is 0.
If the variable is char then sizeof(char), bytes are fetched and if all are 0s then it is interpreted as \0 NUL character.
They cannot be distinguished. One is zero in int format, the other is zero in char format. If you write to a char array[] with
array[0] = 0;
the result is exactly the same as
array[0] = '\0';

Copying int to different memory location, receiving extra bytes than expected

Trying to pre-pend a 2 byte message length, after getting the length in a 4 byte int. I use memcpy to copy 2 bytes of the int. When I look at the second byte I copied, it is as expected, but accessing the first byte actually prints 4 bytes.
I would expect that dest[0] and dest[1] both contain 1 byte of the int. whether or not it's a significant byte, or the order is switched... I can throw in an offset on the memcpy or reversing 0 and 1. It does not have to be portable, I would just like it to work.
The same error is happening in Windows with LoadRunner and Ubuntu with GCC - so I have at least tried to rule out portability as a cause.
I'm not sure where I'm going wrong. I am suspecting it's related to my lack of using pointers recently? Is there a better approach to cast an int to a short and then put it in the first 2 bytes of a buffer?
char* src;
char* dest;
int len = 2753; // Hex - AC1
src=(char*)malloc(len);
dest=(char*)malloc(len+2);
memcpy(dest, &len, 2);
memcpy(dest+2, src, len);
printf("dest[0]: %02x", dest[0]);
// expected result: c1
// actual result: ffffffc1
printf("dest[1]: %02x", dest[1]);
// expected result: 0a
// actual result: 0a
You cannot just take a random two bytes out of a four byte object and call it a cast to short.
You will need to copy your int into a two byte int before doing your memcpy.
But actually, that isn't the best way to do it either, because you have no control over the byte order of an integer.
Your code should look like this:
dest[0] = ((unsigned)len >> 8) & 0xFF;
dest[1] = ((unsigned)len) & 0xFF;
That should write it out in network byte order aka big endian. All of the standard network protocols use this byte order.
And I'd add something like:
assert( ((unsigned)len & 0xFFFF0000) == 0 ); // should be nothing in the high bytes
Firstly, you are using printf incorrectly. This
printf("dest[0]: %02x", dest[0]);
uses x format specifier in printf. x format specifier requires an argument of type unsigned int. Not char, but unsigned int and only unsigned int (or alternatively an int with non-negative value).
The immediate argument you supplied has type char, which is probably signed on your platform. This means that your dest[0] contains -63. A variadic argument of type char is automatically promoted to type int, which turns 0xc1 into 0xffffffc1 (as a signed representation of -63 in type int). Since printf expects an unsigned int value and you are passing a negative int value instead, the behavior is undefined. The printout that you see is nothing more than a manifestation of that undefined behavior. It is meaningless.
One proper way to print dest[0] in this case would be
printf("dest[0]: %02x", (unsigned) dest[0]);
I'm pretty sure the output will still be ffffffc1, but in this case 0xffffffc1 is the prefectly expected result of integer conversion from negative -63 value to unsigned int type. Nothing unusual here.
Alternatively you can do
printf("dest[0]: %02x", (unsigned char) dest[0]);
which should give you your desired c1 output. Note that the conversion to int takes place in this case as well, but since the original value is positive (193), the result of the conversion to int is positive too and printf works properly.
Finally, if you want to work with raw memory directly, the proper type to use would be unsigned char from the very beginning. Not char, but unsigned char.
Secondly, an object of type int may easily occupy more than two 8-bit bytes. Depending on the platform, the 0xA and 0xC1 values might end up in completely different portions of the memory region occupied by that int object. You should not expect that copying the first two bytes of an int object will copy the 0xAC1 portion specifically.
You make the assumption that an "int" is two bytes. What justification do you have for that? Your code is highly unportable.
You make another assumption that "char" is unsigned. What justification do you have for that? Again, your code is highly unportable.
You make another assumption about the ordering of bytes in an int. What justification do you have for that? Again, your code is highly unportable.
instead of the literal 2, use sizeof(int). Never hard code the size of a type.
If this code should be portable, you should not use int, but a fixed size datatype.
If you need 16 bit, you could use int16_t.
Also, the printing of the chars would need a cast to unsigned. Now, the char is upcasted to an int, and the sign is extended. This gives the initial FFFF's

int is 4 bytes but still it can be stored in char why is there no overflow

Check out this program
#include<stdio.h>
int main (){
char c='a';
printf("%d %d", sizeof(c),sizeof('a'));
}
the output is 1 4
I know when we write a statement char c='a';
then how does it happen that in space of 1 byte (char c) some thing of 4 bytes (ASCII code) is stored why is there no overflow etc.
First, per ANSI/IEC 9899:1999(E) §6.4.4.4:
10. An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer. [...]
§6.5.3.4:
2. The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. [...]
3. When applied to an operand that has type char, unsigned char, or signed char,
(or a qualified version thereof) the result is 1. [...]
As you can see, since the type of a character constant is int, for sizeof('a') we get sizeof(int), which is 4 on your platform. However, for sizeof(c), we get the size of a char, which is defined to be 1.
So why can we assign 'a' to a char?
§6.5.16.1:
2. In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
So, the int that is 'a' is implicitly converted to a char. There's an example in there as well, showing explicitly that ints can be implicitly converted to char.
The compiler implicitly converts the int to char.
int i = 42;
char c = i * 2 - 4;
That last line is interpreted by the compiler as:
char c = (char)(i * 2 - 4);
These implicit type conversions are handled by the compiler - there's no "buffer overflow". The (char) is handled internally (by the machine itself, most likely, for simple types like int). It appropriately cuts down on the extra bytes and preserves the "signedness" (+/-).
"A character literal has type int" (http://publib.boulder.ibm.com/infocenter/lnxpcomp/v7v91/index.jsp?topic=%2Fcom.ibm.vacpp7l.doc%2Flanguage%2Fref%2Fclrc02ccon.htm)
But C lets you make theoretically "unsafe" automatic casts -- it's ok to for example,
char c = 34;
even though 34 is clearly a 4-byte int. What makes this safe is that you know when you write 'a' that it's really 1 ascii character and hence 1 byte.
Nice question by the way -- confused me for a bit.

Resources