Strange values when converting bytes to integer in c - c

Take a look at this code:
#include <stdio.h>
#include <stdlib.h>
int byteToInt(char *bytes) {
int32_t v =
(bytes[0] ) +
(bytes[1] << 8 ) +
(bytes[2] << 16) +
(bytes[3] << 24);
return v;
}
int main() {
char b1[] = {0xec, 0x51, 0x04, 0x00};
char b2[] = {0x0c, 0x0c, 0x00, 0x00};
printf("%d\n", byteToInt(b1));
printf("%d\n", byteToInt(b2));
printf("%d\n", *(uint32_t *)b1);
printf("%d\n", *(uint32_t *)b2);
return 0;
}
{0xec, 0x51, 0x04, 0x00} is equal to 283116, but when I use byteToInt function, it returns, for some reason, 282860. There are some byte arrays that cause similar troubles. I realized that value is always mistaken by 256. Still, most of the cases work without any problems - just take a look at b2, it's being calculated as 3084, which is correct. Casting method works in these cases perfetcly but I'd like to know what described problems happen. Could someone, please, explain this to me?

Perhaps char is a signed type (it is implementation-defined), and (int)(char)(0xec) is -20, while (int)(unsigned char)(0xec) is 236.
Try to use unsigned char and uint32_t.
uint32_t byteToInt(unsigned char *bytes) {
uint32_t v =
((uint32_t)bytes[0]) +
((uint32_t)bytes[1] << 8) +
((uint32_t)bytes[2] << 16) +
((uint32_t)bytes[3] << 24);
return v;
}
int main() {
unsigned char b1[] = { 0xec, 0x51, 0x04, 0x00 };
unsigned char b2[] = { 0x0c, 0x0c, 0x00, 0x00 };
printf("%u\n", byteToInt(b1)); // 'u' for unsigned
printf("%u\n", byteToInt(b2));
//printf("%u\n", *(uint32_t *)b1); // undefined behavior
//printf("%u\n", *(uint32_t *)b2); // ditto
return 0;
}
Note that re-interpretation memory content as it is done in two last printfs is undefined behavior (although often works in practice).
BTW, shifting signed negative values is undefined according to the standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; ...
If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.

There are several potential issues with this code. The first is that it is compiler dependent on whether the char type is 8 bits, 16 bits, or even 32 bits. When you do a shift operation on the character type, it may potentially lose the bits "off the end" of the value.
It is safer to first cast the values to a 32 bit type before shifting them and adding them. For example:
unsigned long v =
((unsigned long)bytes[0] ) +
((unsigned long)bytes[1] << 8 ) +
((unsigned long)bytes[2] << 16) +
((unsigned long)bytes[3] << 24);
Your use of the int32_t is also compiler dependent. If memory serves, that's a Windows specific reclassification of int. "int" itself is compiler dependent, older compilers may have it as a 16 bit value, as the standard says it should be the size of a word on the machine you are working on. Using "long" instead of "int" guarantees a 32 bit value.
Additionally, I used "unsigned long" in the example, because I don't think you want to deal with negative numbers in this case. In binary representation, negative numbers have the highest bit set (0x8000000).
If you do want to use negative numbers, then the type should be "long" instead, although this opens a different can of worms when adding positive valued bytes to a negative valued largest byte. In the case where you wanted to deal with negative numbers, you should do a wholly different conversion that peels off the high bit of the high byte, does the addition, and then, if the high bit was set, makes the value negative (v = -v;), and then you need to subtract 1 because of the representation of negative numbers (which is probably outside the scope of this question.)
The revised code, then would be:
#include <stdio.h>
#include <stdlib.h>
unsigned long byteToInt(char *bytes) {
unsigned long v =
((unsigned long)bytes[0] ) +
((unsigned long)bytes[1] << 8 ) +
((unsigned long)bytes[2] << 16) +
((unsigned long)bytes[3] << 24);
return v;
}
int main() {
char b1[] = {0xec, 0x51, 0x04, 0x00};
char b2[] = {0x0c, 0x0c, 0x00, 0x00};
printf("%d\n", byteToInt(b1));
printf("%d\n", byteToInt(b2));
printf("%d\n", *(unsigned long *)b1);
printf("%d\n", *(unsigned long *)b2);
return 0;
}

Related

Why does converting string to hex need to be done with 0xff in C?

Following on this question How to convert a string to hex and vice versa in c??
I run the following code:
#include <stdio.h>
int main (int argc, char *argv[])
{
char str[] = { 0x58, 0x01, 0x02, 0x20, 0x22, 0x00, 0xC5};
char hex[32] = {0}, hex2[32] = {0};
for (int i = 0; i < sizeof(str); i++) {
sprintf(hex + i*2, "%02X", str[i]);
}
for (int i = 0; i < sizeof(str); i++) {
sprintf(hex2 + i*2, "%02X", str[i] & 0xff);
}
printf("hex = %s\nhex2 = %s\n", hex, hex2);
return 0;
}
I get this result:
hex = 580102202200FFFFFFC5
hex2 = 580102202200C5
I wonder why there is more FFFFFF without 0xFF?
There are many small pieces to the puzzle that needs to be put together.
To begin with it's implementation-defined if char is signed or unsigned.
If it's signed, then because of the common two's complement the value 0xc5 will actually be negative (it will be the decimal value -59).
And when a small integer type (like char) is used as an argument to a variable-argument function like sprintf, then it will be promoted to int. If the type is signed and the value is negative, it will be sign extended. So assuming the common 32-bit int, the signed and negative value 0xc5 will be promoted and sign-extended to 0xffffffc5.
To solve this problem, the simplest solution is to simply use an explicit unsigned type like uint8_t (which basically is an alias for unsigned char).
Besides that I also recommend you use the hh format type prefix to explicitly say that the argument is a byte.
Put together change to
uint8_t str[] = { 0x58, 0x01, 0x02, 0x20, 0x22, 0x00, 0xC5};
and
sprintf(hex + i*2, "%02hhX", str[i]);
As for the difference made by str[i] & 0xff, it's because then you mask out the top 24 bits of the int value. So 0xffffffc5 & 0xff becomes 0x000000c5.

Dereference 4 bytes pointer to 1 byte array

I have a program that looks like this
#include <stdio.h>
#include <stdint.h>
int main()
{
uint8_t arr[10] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09};
uint32_t *p = (uint32_t *)&arr[4];
printf("Value of p: %02Xh\n", *p);
return 0;
}
The output being printed is 7060504h which is
*p = (arr[7] << 24) | (arr[6] << 16) | (arr[5] << 8) | arr[4]; // (1)
I have the following questions:
Why is arr[7] MSB but not arr[4]?
Why is the output calculated following (1) not (2)?
*p = (arr[1] << 24) | (arr[2] << 16) | (arr[3] << 8) | arr[4]; // (2)
Why is arr[7] MSB but not arr[4]?
The C standard says, in C 2018 6.2.6.1: “Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number,
order, and encoding of which are either explicitly specified or implementation-defined.”
This means a C implementation may put the bytes of an object in any order it chooses and documents. The implementation you are using puts the low-value byte at the low address and the big-value byte at the big address. This is called little endian. (The opposite order is big endian.)
Why is the output calculated following (1) not (2)?
This question is unclear. Why would you expect an expression using elements 4, 5, 6, and 7 of the array to produce a value from elements 1, 2, 3, and 4?
uint32_t *p = (uint32_t *)&arr[4];
The behavior of using *p defined this way is not defined by the C standard. It results in an array of uint8_t being accessed as if it were a uint32_t. C 2018 6.5 7 says that is not defined. There can also be errors because the alignment requirement of a uint32_t may be stricter than the requirement of a uint8_t.
A defined way to set the bytes of uint32_t is:
uint32_t p;
memcpy(&p, arr, sizeof p); // Include <string.h> to declare memcpy.
Then, since this p is not a pointer, print it with printf("Value of p: %08Xh\n", p);.

Is possible cast from a preprocessor to array?

I am trying to cast a preprocesor to an array, But I am not sure if it is possible at all,
Where for example I have defined:
Number 0x44332211
Code below:
#include <stdio.h>
#include <stdint.h>
#define number 0x44332211
int main()
{
uint8_t array[4] = {(uint8_t)number, (uint8_t)number << 8,(uint8_t)(number <<16 ),(uint8_t)(number <<24)};
printf("array[%x] \n\r",array[0]); // 0x44
printf("array[%x] \n\r",array[1]); // 0x33
printf("array[%x] \n\r",array[2]); // 0x22
printf("array[%x] \n\r",array[3]); // 0x11
return 0;
}
and I want to cast it two an uint8_t array[4] where array[0] = 0x44, array[1] = 0x33, array[2] = 0x22, array[3] = 0x11
Is it possible?
my output:
array[11]
array[0]
array[0]
array[0]
A couple of realizations are needed:
uint8_t masks out the least significant byte of the data. Meaning you have to right shift data down into the least significant byte, not left shift data away from it.
0x44332211 is an integer constant, not a "preprocessor". It is of type int and therefore signed. You shouldn't use bitwise operators on signed types. Easily solved by changing to 0x44332211u with unsigned suffix.
Typo here: (uint8_t)number << 8. You should shift then cast. Casts have higher precedence than shift.
#include <stdio.h>
#include <stdint.h>
#define number 0x44332211u
int main()
{
uint8_t array[4] =
{
(uint8_t)(number >> 24),
(uint8_t)(number >> 16),
(uint8_t)(number >> 8),
(uint8_t) number
};
printf("array[%x] \n\r",array[0]); // 0x44
printf("array[%x] \n\r",array[1]); // 0x33
printf("array[%x] \n\r",array[2]); // 0x22
printf("array[%x] \n\r",array[3]); // 0x11
return 0;
}
This is not really a cast in any way. You have defined a constant and compute the values of the array based on that constant. Keep in mind that in this case, the preprocessor simply does a search and replace, nothing clever.
Also, your shift is in the wrong direction. You keep the last (rightmost) 8 bits when casting int to uint8_t, not the first (leftmost) ones.
Yes, you are casting an int to a uint8_t. The only problem is that, when you make the shifts, the result won't fit in the type you are casting to and information will be lost.
Your uint8_t casts are just taking the least significant byte. that's why you get 11 in the first case and 0 in the others... because your shifts to the left leave 0 in the rightmost positions.

how to split 16-value into two 8-bit values in C

I don't know if the question is right, but.
Example, a decimal of 25441, the binary is 110001101100001. How can i split it into two 8 bit "1100011" and "01100001"( which is "99" and "97"). However, I could only think of using bit manipulation to shift it by >>8 and i couldn't do the rest for "97". Here is my function, it's not a good one but i hope it helps:
void reversecode(int input[], char result[]) { //input is 25441
int i;
for (i = 0; i < 1; i++) {
result[i] = input[i] >> 8; // shift by 8 bit
printf("%i", result[i]); //to print result
}
}
I was thinking to use struct but i have no clue for starting it. i'm a beginenr in C, and sorry for my bad style. Thank you in prior.
The LSB is given simply by masking is out with a bit mask: input[i] & 0xFF.
The code you have posted input[i] >> 8 gives the next byte before that. However, it also gives anything that happened to be stored in the most significant bytes, in case int is 32 bits. So again you need to mask, (input[i] >> 8) & 0xFF.
Also avoid bit-shifting on signed types such as int, because if they have negative values, you invoke poorly-specified behavior which leads to bugs.
The correct way to mask out the individual bytes of an int is this:
// 16 bit system
uint8_t bytes [sizeof(int)] =
{
((uint16_t)i >> 0) & 0xFF, // shift by 0 not needed, of course, just stylistic
((uint16_t)i >> 8) & 0xFF,
};
// 32 bit system
uint8_t bytes [sizeof(int)] =
{
((uint32_t)i >> 0) & 0xFF,
((uint32_t)i >> 8) & 0xFF,
((uint32_t)i >> 16) & 0xFF,
((uint32_t)i >> 24) & 0xFF,
};
This places the LSB at index 0 in this array, similar to Little Endian representation in memory. Note however that the actual bit-shift is endianess-independent, and also fast, which is why it's a superior method.
Solutions based on unions or pointer arithmetic depend on endianess and are often buggy (pointer aliasing violations), so they should be avoided, as there is no benefit of using them.
you can use the bit-masking concept.
Like this,
uint16_t val = 0xABCD;
uint8_t vr = (uint8_t) (val & 0x00FF);
Or this can also be done by simply explicit type casting, as an 8-bit integer only carries LBS 8-bits from 16-bits value, & discards the remaining MSB 8-bits (by default, when assigns a larger value). This all done after shifting of bits.

How to shift bytes from char array into int

I would like to make a int varibale out of a char array in C.
The char array looks like this:
buffer[0] = 0xcf
buffer[1] = 0x04
buffer[2] = 0x00
buffer[3] = 0x00
The shifting looks like this
x = (buffer[1] << 8 )| (buffer[0] << 0) ;
After that x looks like this:
x = 0xffff04cf
Right now everthing would be fine, if the first two bytes wouldn't be ff.
If I try this line
x = (buffer[3] << 24 )| (buffer[2] << 16)| (buffer[1] << 8)| (buffer[0] << 0) ;
it still looks
x = 0xffff04cf
Even when I try to shift in the zeros before or after I shift in 04cf it looks still the same.
Is this the rigth idea to it or what am I doing wrong?
The issue is that you declared buffer by means of a signed type, probably (signed) char. When applying operator <<, integral promotions will be performed, and as the value 0xcf in an 8-bit signed type represents a negative value (i.e. -49), it will remain a negative value (yet represented by more bits, i.e. 0xffffffcf). Note that -1 is represented as 0xFFFFFFFF and vice versa.
To overcome this issue, simply define buffer as
unsigned char buffer[4]
And if you weren't allowed to change the data type of buffer, you could write...
unsigned x = ( (unsigned char)buffer[0] << 8 )| ((unsigned char)buffer[1] << 4) ;
For tasks like this I like using unions, for example:
union tag_int_chars {
char buffer[sizeof(int32_t)];
int32_t value;
} int_chars;
int_chars.value = 0x01234567;
int_chars.buffer[0] = 0xff;
This will automate the memory overlay without the need to shift. Set the value of the int and voila the chars have changed, change a char value and voila the int has changed.
The example will leave the int value = 0x012345ff on a little endian machine.
Another easy way is to use memcpy():
#include <string.h>
char buffer[sizeof(int32_t)];
int32_t value;
memcpy(&value, buffer, sizeof(int32_t)); // chars to int
memcpy(buffer, &value, sizeof(int32_t)); // int to chars

Resources