I have a program that looks like this
#include <stdio.h>
#include <stdint.h>
int main()
{
uint8_t arr[10] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09};
uint32_t *p = (uint32_t *)&arr[4];
printf("Value of p: %02Xh\n", *p);
return 0;
}
The output being printed is 7060504h which is
*p = (arr[7] << 24) | (arr[6] << 16) | (arr[5] << 8) | arr[4]; // (1)
I have the following questions:
Why is arr[7] MSB but not arr[4]?
Why is the output calculated following (1) not (2)?
*p = (arr[1] << 24) | (arr[2] << 16) | (arr[3] << 8) | arr[4]; // (2)
Why is arr[7] MSB but not arr[4]?
The C standard says, in C 2018 6.2.6.1: “Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number,
order, and encoding of which are either explicitly specified or implementation-defined.”
This means a C implementation may put the bytes of an object in any order it chooses and documents. The implementation you are using puts the low-value byte at the low address and the big-value byte at the big address. This is called little endian. (The opposite order is big endian.)
Why is the output calculated following (1) not (2)?
This question is unclear. Why would you expect an expression using elements 4, 5, 6, and 7 of the array to produce a value from elements 1, 2, 3, and 4?
uint32_t *p = (uint32_t *)&arr[4];
The behavior of using *p defined this way is not defined by the C standard. It results in an array of uint8_t being accessed as if it were a uint32_t. C 2018 6.5 7 says that is not defined. There can also be errors because the alignment requirement of a uint32_t may be stricter than the requirement of a uint8_t.
A defined way to set the bytes of uint32_t is:
uint32_t p;
memcpy(&p, arr, sizeof p); // Include <string.h> to declare memcpy.
Then, since this p is not a pointer, print it with printf("Value of p: %08Xh\n", p);.
Related
Could someone explain me this code please ? I have received some byte code from an assembler and now I have to use it in my virtual machine. This code is used but I don't know how it works and what it is used for.
static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}
It builds up a 32 bit word from 4 bytes.
For example if the bytes are : 1st: 0x12 , 2nd: 0x34, 3rd: 0x56, 4th: 0x78
Then:
static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 | // -> 0x12000000
(uint32_t)bytes[1] << 16 | // -> 0x00340000
(uint32_t)bytes[2] << 8 | // -> 0x00005600
(uint32_t)bytes[3] << 0 ; // -> 0x00000078
return (int32_t)result; // bitwise oring this result -> 0x12345678
}
This function attempts to combine the four bytes in a uint8_t[4] into a single uint32_t with big-endian byte order, cast the result into a signed int32_t, and return that.
So, if you pass a pointer to the array { 0xAA, 0xBB, 0xCC, 0xDD } to the function, it will combine them into a 32-bit integer with the most significant bytes of the integer coming from the lowest addresses in the array, giving you 0xAABBCCDD or -1430532899.
However, if the array pointed to by the argument bytes is not at least four bytes long, it has undefined behavior.
i saw this question at my c language final exam and the output is 513 and i don't know why
#include <stdio.h>
int main(void){
char a[4] = {1,2,3,4};
print("%d" , *(short*)a);
}
Your array of bytes is (in hex):
[ 0x01, 0x02, 0x03, 0x04 ]
If you treat the start of the array not as an array of bytes, but as the start of a short, then your short has value 0x01 0x02, and because your processor is "Little Endian", it reads backwards from how humans read it. We would it as 0x0201, which is the same as 513(Decimal)
If the system this code is being run on meets the following requirements:
Unaligned memory access is permitted (or a is guaranteed to be short-aligned)
Little-endian byte order is used
sizeof(short) == 2
CHAR_BIT == 8
Then dereferencing a short * pointer to the following memory:
| 0x01 | 0x02 | 0x03 | 0x04 |
Will give you 0x0201, or 513 in base 10.
Also, do note that even if all these requirements are met, aliasing a char [] array as a short * violates the strict aliasing rule.
The code casts your char* pointer into short* one and prints its value.
short in C is represented in 2 bytes, and the binary representation of the first two bytes of your array is 00000001 00000010 but because the processor is a little endian one it reads it as 00000010 00000001 which is 513 in decimal.
I am trying to cast a preprocesor to an array, But I am not sure if it is possible at all,
Where for example I have defined:
Number 0x44332211
Code below:
#include <stdio.h>
#include <stdint.h>
#define number 0x44332211
int main()
{
uint8_t array[4] = {(uint8_t)number, (uint8_t)number << 8,(uint8_t)(number <<16 ),(uint8_t)(number <<24)};
printf("array[%x] \n\r",array[0]); // 0x44
printf("array[%x] \n\r",array[1]); // 0x33
printf("array[%x] \n\r",array[2]); // 0x22
printf("array[%x] \n\r",array[3]); // 0x11
return 0;
}
and I want to cast it two an uint8_t array[4] where array[0] = 0x44, array[1] = 0x33, array[2] = 0x22, array[3] = 0x11
Is it possible?
my output:
array[11]
array[0]
array[0]
array[0]
A couple of realizations are needed:
uint8_t masks out the least significant byte of the data. Meaning you have to right shift data down into the least significant byte, not left shift data away from it.
0x44332211 is an integer constant, not a "preprocessor". It is of type int and therefore signed. You shouldn't use bitwise operators on signed types. Easily solved by changing to 0x44332211u with unsigned suffix.
Typo here: (uint8_t)number << 8. You should shift then cast. Casts have higher precedence than shift.
#include <stdio.h>
#include <stdint.h>
#define number 0x44332211u
int main()
{
uint8_t array[4] =
{
(uint8_t)(number >> 24),
(uint8_t)(number >> 16),
(uint8_t)(number >> 8),
(uint8_t) number
};
printf("array[%x] \n\r",array[0]); // 0x44
printf("array[%x] \n\r",array[1]); // 0x33
printf("array[%x] \n\r",array[2]); // 0x22
printf("array[%x] \n\r",array[3]); // 0x11
return 0;
}
This is not really a cast in any way. You have defined a constant and compute the values of the array based on that constant. Keep in mind that in this case, the preprocessor simply does a search and replace, nothing clever.
Also, your shift is in the wrong direction. You keep the last (rightmost) 8 bits when casting int to uint8_t, not the first (leftmost) ones.
Yes, you are casting an int to a uint8_t. The only problem is that, when you make the shifts, the result won't fit in the type you are casting to and information will be lost.
Your uint8_t casts are just taking the least significant byte. that's why you get 11 in the first case and 0 in the others... because your shifts to the left leave 0 in the rightmost positions.
Take a look at this code:
#include <stdio.h>
#include <stdlib.h>
int byteToInt(char *bytes) {
int32_t v =
(bytes[0] ) +
(bytes[1] << 8 ) +
(bytes[2] << 16) +
(bytes[3] << 24);
return v;
}
int main() {
char b1[] = {0xec, 0x51, 0x04, 0x00};
char b2[] = {0x0c, 0x0c, 0x00, 0x00};
printf("%d\n", byteToInt(b1));
printf("%d\n", byteToInt(b2));
printf("%d\n", *(uint32_t *)b1);
printf("%d\n", *(uint32_t *)b2);
return 0;
}
{0xec, 0x51, 0x04, 0x00} is equal to 283116, but when I use byteToInt function, it returns, for some reason, 282860. There are some byte arrays that cause similar troubles. I realized that value is always mistaken by 256. Still, most of the cases work without any problems - just take a look at b2, it's being calculated as 3084, which is correct. Casting method works in these cases perfetcly but I'd like to know what described problems happen. Could someone, please, explain this to me?
Perhaps char is a signed type (it is implementation-defined), and (int)(char)(0xec) is -20, while (int)(unsigned char)(0xec) is 236.
Try to use unsigned char and uint32_t.
uint32_t byteToInt(unsigned char *bytes) {
uint32_t v =
((uint32_t)bytes[0]) +
((uint32_t)bytes[1] << 8) +
((uint32_t)bytes[2] << 16) +
((uint32_t)bytes[3] << 24);
return v;
}
int main() {
unsigned char b1[] = { 0xec, 0x51, 0x04, 0x00 };
unsigned char b2[] = { 0x0c, 0x0c, 0x00, 0x00 };
printf("%u\n", byteToInt(b1)); // 'u' for unsigned
printf("%u\n", byteToInt(b2));
//printf("%u\n", *(uint32_t *)b1); // undefined behavior
//printf("%u\n", *(uint32_t *)b2); // ditto
return 0;
}
Note that re-interpretation memory content as it is done in two last printfs is undefined behavior (although often works in practice).
BTW, shifting signed negative values is undefined according to the standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; ...
If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
There are several potential issues with this code. The first is that it is compiler dependent on whether the char type is 8 bits, 16 bits, or even 32 bits. When you do a shift operation on the character type, it may potentially lose the bits "off the end" of the value.
It is safer to first cast the values to a 32 bit type before shifting them and adding them. For example:
unsigned long v =
((unsigned long)bytes[0] ) +
((unsigned long)bytes[1] << 8 ) +
((unsigned long)bytes[2] << 16) +
((unsigned long)bytes[3] << 24);
Your use of the int32_t is also compiler dependent. If memory serves, that's a Windows specific reclassification of int. "int" itself is compiler dependent, older compilers may have it as a 16 bit value, as the standard says it should be the size of a word on the machine you are working on. Using "long" instead of "int" guarantees a 32 bit value.
Additionally, I used "unsigned long" in the example, because I don't think you want to deal with negative numbers in this case. In binary representation, negative numbers have the highest bit set (0x8000000).
If you do want to use negative numbers, then the type should be "long" instead, although this opens a different can of worms when adding positive valued bytes to a negative valued largest byte. In the case where you wanted to deal with negative numbers, you should do a wholly different conversion that peels off the high bit of the high byte, does the addition, and then, if the high bit was set, makes the value negative (v = -v;), and then you need to subtract 1 because of the representation of negative numbers (which is probably outside the scope of this question.)
The revised code, then would be:
#include <stdio.h>
#include <stdlib.h>
unsigned long byteToInt(char *bytes) {
unsigned long v =
((unsigned long)bytes[0] ) +
((unsigned long)bytes[1] << 8 ) +
((unsigned long)bytes[2] << 16) +
((unsigned long)bytes[3] << 24);
return v;
}
int main() {
char b1[] = {0xec, 0x51, 0x04, 0x00};
char b2[] = {0x0c, 0x0c, 0x00, 0x00};
printf("%d\n", byteToInt(b1));
printf("%d\n", byteToInt(b2));
printf("%d\n", *(unsigned long *)b1);
printf("%d\n", *(unsigned long *)b2);
return 0;
}
Thx for the help but maybe I should clarify my problem. I want to read some addresses from a text file, for instance:
someFile::
0xc4fap4424ab1 0xac8374ce93ac ... etc
Now I want to take these addresses and convert them to decimal so that I can make address comparisons. So my question is
1. How should I go about storing these addresses
2. Once I have stored the addresses, how can I convert them to decimal.
Your initializer declares just one element for the temp array, and that value is way beyond the capacity of a char. The maximum value that you can store in a single char is 0xFF.
If temp was something in the form of:
char temp[] = {0x12, 0x34, 0x56, 0x78};
Then you could do some bit twiddling to produce an int out of the 4 bytes. Actually I would use uint32_t to be platform-independent:
uint32_t x = (temp[0] << 24) | (temp[1] << 16) | (temp[2] << 8) | temp[3];
Of course, you have to take care of the endianness as well.