Convert char array to int in C - c

Is this a safe way to convert array to number?
// 23 FD 15 94 -> 603788692
char number[4] = {0x94, 0x15, 0xFD, 0x23};
uint32_t* n = (uint32_t*)number;
printf("number is %lu", *n);
MORE INFO
I'm using that in a embedded device with LSB architecture, does not need to be portable.
I'm currently using shifting, but if this code is safe i prefer it.

No. You're only allowed to access something as an integer if it is an integer.
But here's how you can manipulate the binary representation of an object by simply turning the logic around:
uint32_t n;
unsigned char * p = (unsigned char *)&n;
assert(sizeof n == 4); // assumes CHAR_BIT == 8
p[0] = 0x94; p[1] = 0x15; p[2] = 0xFD; p[3] = 0x23;
The moral: You can treat every object as a sequence of bytes, but you can't treat an arbitrary sequence of bytes as any particular object.
Moreover, the binary representation of a type is very much platform dependent, so there's no telling what actual integer value you get out from this. If you just want to synthesize an integral value from its base-256 digits, use normal maths:
uint32_t n = 0x94 + (0x15 * 0x100) + (0xFD * 0x10000) + (0x23 * 0x1000000);
This is completely platform-independent and expresses what you want purely in terms of values, not representations. Leave it to your compiler to produce a machine representation of the code.

No, it is not safe.
This is violating C aliasing rules that say that an object can only be accessed trough its own type, its signed / unsigned variant or through a character type. It can also invoke undefined behavior by breaking alignment.
A safe solution to get a uint32_t value from the array is to use bitwise operators (<< and &) on the char values to form an uint32_t.

You're better off with something like this (more portable):
int n = (c[3]<<24)|(c[2]<<16)|(c[1]<<8)|c[0];
where c is an unsigned char array.

Related

How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?
Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):
char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];
The variable i is 17480 insead of 68.
UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]
General considerations
There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.
Determining native endianness
If you need to compute the host machine's native endianness, then this will do it:
static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;
It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.
Reading the data
I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.
uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }
Converting the data
It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():
if (!host_is_little_endian) {
data = ntohl(data);
}
(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)
Signed integers
If you need a signed instead of unsigned integer, then you can do the same, but use a union:
union {
uint32_t unsigned;
int32_t signed;
} data;
In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.
Suppose you point into your buffer:
unsigned char *p = &buf[20];
and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:
int i;
i = *(int *)p;
You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.
However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:
unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;
EDIT
The suggestions and comments of Andrew Henle and Bodo could give:
unsigned char *p = &buf[20];
int i, j;
unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;
// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

Correct way of reading bytes from IEEE754 floating point format

I have a requirement where I need to read the 4 raw bytes of the single precision IEEE754 floating point representation as to send on the serial port as it is without any modification. I just wanted to ask what is the correct way of extracting the bytes among the following:
1.) creating a union such as:
typedef union {
float f;
uint8_t bytes[4];
struct {
uint32_t mantissa : 23;
uint32_t exponent : 8;
uint32_t sign : 1;
};
} FloatingPointIEEE754_t ;
and then just reading the bytes[] array after writing to the float variable f?
2.) Or, extracting bytes by a function in which a uint32_t type pointer is made to point to the float variable and then the bytes are extracted via masking
uint32_t extractBitsFloat(float numToExtFrom, uint8_t numOfBits, uint8_t bitPosStartLSB){
uint32_t *p = &numToExtFrom;
/* validate the inputs */
if ((numOfBits > 32) || (bitPosStartLSB > 31)) return NULL;
/* build the mask */
uint32_t mask = ((1 << numOfBits) - 1) << bitPosStartLSB;
return ((*p & mask) >> bitPosStartLSB);
}
where calling will be made like:
valF = -4.235;
byte0 = extractBitsFloat(valF, 8, 0);
byte1 = extractBitsFloat(valF, 8, 8);
byte2 = extractBitsFloat(valF, 8, 16);
byte3 = extractBitsFloat(valF, 8, 24);
Please suggest me the correct way if you think both the above-mentioned methods are wrong!
First of all, I assume you're coding specifically for a platform where float actually is represented in a IEEE754 single. You can't take this for granted in general, so your code won't be portable to all platforms.
Then, the union approach is the correct one. But don't add this bitfield member! There's no guarantee how the bits will be arranged, so you might access the wrong bits. Just do this:
typedef union {
float f;
uint8_t bytes[4];
} FloatingPointIEEE754;
Also, don't add a _t suffix to your own types. On POSIX systems, this is reserved to the implementation, so it's best to always avoid it.
Instead of using a union, accessing the bytes through a char pointer is fine as well:
unsigned char *rep = (unsigned char *)&f;
// access rep[0] to rep[3]
Note in both cases, you are accessing the representation in memory, this means you have to pay attention to the endianness of your machine.
Your second option isn't correct, it violates the strict aliasing rule. In short, you're not allowed to access an object through a pointer that doesn't have compatible type, a char pointer is an explicit exception for accessing the representation. The exact rules are written in 6.5 p7 of N1570, the latest draft to the C11 standard.
You can do:
unsigned char *p = (unsigned char *)&the_float;
and then read 4 bytes from where p is pointing (e.g. p[0], p[1], etc.). The exact best code to "read 4 bytes" depends on what form the serial port function accepts data in.
If you do not care of endianness, just alias a character pointer to the address of a float. The standard explicitely allows to use a charater pointer to access the bytes of the representation of any type. If you need a specific endianness to send the bytes on the serial port, you can test for it before sending:
Simple way, just use native endianness:
float f;
...
char * bytes = &f; // bytes point the the beginning of a char array of size sizeof(f)
Automatically test for endianness and uses big endian (AKA network order). The struct is just a trick to return an array and have thread safe code.
struct float_bytes {
char bytes[sizeof(float)];
};
struct float_bytes(float f) {
float end = 1.;
float_bytes resul;
char *src = (char *) &f;
if (*end == 0) { // end is 0 on a little endian platform, else 0x3f
int i = sizeof(f) { // little endian: reverse the bytes
while (i > 0) {
resul.bytes[--i] = src++;
}
}
else { // already in big endian order, just memcpy
memcpy(&(resul.bytes), &f, sizeof(f));
}
return resul;
}
Beware: the test for endianness will only make sense if floating point is IEEE754 single.

declaring string using pointer to int

I am trying to initialize a string using pointer to int
#include <stdio.h>
int main()
{
int *ptr = "AAAA";
printf("%d\n",ptr[0]);
return 0;
}
the result of this code is 1094795585
could any body explain this behavior and why the code gave this answers ?
I am trying to initialize a string using pointer to int
The string literal "AAAA" is of type char[5], that is array of five elements of type char.
When you assign:
int *ptr = "AAAA";
you actually must use explicit cast (as types don't match):
int *ptr = (int *) "AAAA";
But, still it's potentially invalid, as int and char objects may have different alignment requirements. In other words:
alignof(char) != alignof(int)
may hold. Also, in this line:
printf("%d\n", ptr[0]);
you are invoking undefined behavior (so it might print "Hello from Mars" if compiler likes so), as ptr[0] dereferences ptr, thus violating strict aliasing rule.
Note that it is valid to make transition int * ---> char * and read object as char *, but not the opposite.
the result of this code is 1094795585
The result makes sense, but for that, you need to rewrite your program in valid form. It might look as:
#include <stdio.h>
#include <string.h>
union StringInt {
char s[sizeof("AAAA")];
int n[1];
};
int main(void)
{
union StringInt si;
strcpy(si.s, "AAAA");
printf("%d\n", si.n[0]);
return 0;
}
To decipher it, you need to make some assumptions, depending on your implementation. For instance, if
int type takes four bytes (i.e. sizeof(int) == 4)
CPU has little-endian byte ordering (though it's not really matter, since every letter is the same)
default character set is ASCII (the letter 'A' is represented as 0x41, that is 65 in decimal)
implementation uses two's complement representation of signed integers
then, you may deduce, that si.n[0] holds in memory:
0x41 0x41 0x41 0x41
that is in binary:
01000001 ...
The sign (most-significant) bit is unset, hence it is just equal to:
65 * 2^24 + 65 * 2^16 + 65 * 2^8 + 65 =
65 * (2^24 + 2^16 + 2^8 + 1) = 65 * 16843009 = 1094795585
1094795585 is correct.
'A' has the ASCII value 65, i.e. 0x41 in hexadecimal.
Four of them makes 0x41414141 which is equal to 1094795585 in decimal.
You got the value 65656565 by doing 65*100^0 + 65*100^1 + 65*100^2 + 65*100^3 but that's wrong since a byte1 can contain 256 different values, not 100.
So the correct calculation would be 65*256^0 + 65*256^1 + 65*256^2 + 65*256^3, which gives 1094795585.
It's easier to think of memory in hexadecimal because one hexadecimal digit directly corresponds to half a byte1, so two hex digits is one full byte1 (cf. 0x41). Whereas in decimal, 255 fits in a single byte1, but 256 does not.
1 assuming CHAR_BIT == 8
65656565 this is a wrong representation of the value of "AAAA" you are seprately representing each character and "AAAA" is stored as array.Its converting into 1094795585 because %d identifier prints decimal value. Run this in gdb with following command:
x/8xb (pointer) //this will show you the memory hex value
x/d (pointer) //this will show you the converted decimal value
#zenith gave you the answer you expected, but your code invokes UB. Anyway, you could demonstrate the same in an almost correct way :
#include <stdio.h>
int main()
{
int i, val;
char *pt = (char *) &val; // cast a pointer to any to a pointer to char : valid
for (i=0; i<sizeof(int); i++) pt[i] = 'A'; // assigning bytes of int : UB in general case
printf("%d 0x%x\n",val, val);
return 0;
}
Assigning bytes of an int is UB in the general case because C standard says that [for] signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. And a remark adds Some combinations of padding bits might generate trap representations, for example, if one padding
bit is a parity bit.
But in common architectures, there are no padding bits and all bits values correspond to valid numbers, so the operation is valid (but implementation dependant) on all common systems. It is still implementation dependant because size of int is not fixed by standard, nor is endianness.
So : on a 32 bit system using no padding bits, above code will produce
1094795585 0x41414141
indepentantly of endianness.

How to get float in bytes?

I am using the HIDAPI to send some data to a USB device. This data can be sent only as byte array and I need to send some float numbers inside this data array. I know floats have 4 bytes. So I thought this might work:
float f = 0.6;
char data[4];
data[0] = (int) f >> 24;
data[1] = (int) f >> 16;
data[2] = (int) f >> 8;
data[3] = (int) f;
And later all I had to do is:
g = (float)((data[0] << 24) | (data[1] << 16) | (data[2] << 8) | (data[3]) );
But testing this shows me that the lines like data[0] = (int) f >> 24; returns always 0. What is wrong with my code and how may I do this correctly (i.e. break a float inner data in 4 char bytes and rebuild the same float later)?
EDIT:
I was able to accomplish this with the following codes:
float f = 0.1;
unsigned char *pc;
pc = (unsigned char*)&f;
// 0.6 in float
pc[0] = 0x9A;
pc[1] = 0x99;
pc[2] = 0x19;
pc[3] = 0x3F;
std::cout << f << std::endl; // will print 0.6
and
*(unsigned int*)&f = (0x3F << 24) | (0x19 << 16) | (0x99 << 8) | (0x9A << 0);
I know memcpy() is a "cleaner" way of doing it, but this way I think the performance is somewhat better.
You can do it like this:
char data[sizeof(float)];
float f = 0.6f;
memcpy(data, &f, sizeof f); // send data
float g;
memcpy(&g, data, sizeof g); // receive data
In order for this to work, both machines need to use the same floating point representations.
As was rightly pointed out in the comments, you don't necessarily need to do the extra memcpy; instead, you can treat f directly as an array of characters (of any signedness). You still have to do memcpy on the receiving side, though, since you may not treat an arbitrary array of characters as a float! Example:
unsigned char const * const p = (unsigned char const *)&f;
for (size_t i = 0; i != sizeof f; ++i)
{
printf("Byte %zu is %02X\n", i, p[i]);
send_over_network(p[i]);
}
In standard C is guaranted that any type can be accessed as an array of bytes.
A straight way to do this is, of course, by using unions:
#include <stdio.h>
int main(void)
{
float x = 0x1.0p-3; /* 2^(-3) in hexa */
union float_bytes {
float val;
unsigned char bytes[sizeof(float)];
} data;
data.val = x;
for (int i = 0; i < sizeof(float); i++)
printf("Byte %d: %.2x\n", i, data.bytes[i]);
data.val *= 2; /* Doing something with the float value */
x = data.val; /* Retrieving the float value */
printf("%.4f\n", data.val);
getchar();
}
As you can see, it is not necessary at all to use memcpy or pointers...
The union approach is easy to understand, standard and fast.
EDIT.
I will explain why this approach is valid in C (C99).
[5.2.4.2.1(1)] A byte has CHAR_BIT bits (an integer constant >= 8, in almost cases is 8).
[6.2.6.1(3)] The unsigned char type uses all its bits to represent the value of the object, which is an nonnegative integer, in a pure binary representation. This means that there are not padding bits or bits used for any other extrange purpouse. (The same thing is not guaranted for signed char or char types).
[6.2.6.1(2)] Every non-bitfield type is represented in memory as a contiguous sequence of bytes.
[6.2.6.1(4)] (Cited) "Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); [...]"
[6.7.2.1(14)] A pointer to a structure object (in particular, unions), suitably converted, points to its initial member. (Thus, there is no padding bytes at the beginning of a union).
[6.5(7)] The content of an object can be accessed by a character type:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively,amember of a subaggregate or contained union), or
— a character type
More information:
A discussion in google groups
Type-punning
EDIT 2
Another detail of the standard C99:
[6.5.2.3(3) footnote 82] Type-punning is allowed:
If the member used to access the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called "type
punning"). This might be a trap representation.
The C language guarantees that any value of any type¹ can be accessed as an array of bytes. The type of bytes is unsigned char. Here's a low-level way of copying a float to an array of bytes. sizeof(f) is the number of bytes used to store the value of the variable f; you can also use sizeof(float) (you can either pass sizeof a variable or more complex expression, or its type).
float f = 0.6;
unsigned char data[sizeof(float)];
size_t i;
for (i = 0; i < sizeof(float); i++) {
data[i] = (unsigned char*)f + i;
}
The functions memcpy or memmove do exactly that (or an optimized version thereof).
float f = 0.6;
unsigned char data[sizeof(float)];
memcpy(data, f, sizeof(f));
You don't even need to make this copy, though. You can directly pass a pointer to the float to your write-to-USB function, and tell it how many bytes to copy (sizeof(f)). You'll need an explicit cast if the function takes a pointer argument other than void*.
int write_to_usb(unsigned char *ptr, size_t size);
result = write_to_usb((unsigned char*)f, sizeof(f))
Note that this will work only if the device uses the same representation of floating point numbers, which is common but not universal. Most machines use the IEEE floating point formats, but you may need to switch endianness.
As for what is wrong with your attempt: the >> operator operates on integers. In the expression (int) f >> 24, f is cast to an int; if you'd written f >> 24 without the cast, f would still be automatically converted to an int. Converting a floating point value to an integer approximates it by truncating or rounding it (usually towards 0, but the rule depends on the platform). 0.6 rounded to an integer is 0 or 1, so data[0] is 0 or 1 and the others are all 0.
You need to act on the bytes of the float object, not on its value.
¹ Excluding functions which can't really be manipulated in C, but including function pointers which functions decay to automatically.
Assuming that both devices have the same notion of how floats are represented then why not just do a memcpy. i.e
unsigned char payload[4];
memcpy(payload, &f, 4);
the safest way to do this, if you control both sides is to send some sort of standardized representation... this isn't the most efficient, but it isn't too bad for small numbers.
hostPort writes char * "34.56\0" byte by byte
client reads char * "34.56\0"
then converts to float with library function atof or atof_l.
of course that isn't the most optimized, but it sure will be easy to debug.
if you wanted to get more optimized and creative, first byte is length then the exponent, then each byte represents 2 decimal places... so
34.56 becomes char array[] = {4,-2,34,56}; something like that would be portable... I would just try not to pass binary float representations around... because it can get messy fast.
It might be safer to union the float and char array. Put in the float member, pull out the 4 (or whatever the length is) bytes.

Copying a 4 element character array into an integer in C

A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.
char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);
Output is
6513249 1633837824 6513249
Which one is correct? What is going wrong?
It's an endianness issue. When you interpret the char* as an int* the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.
To put this into pictures, this is the source array:
a b c \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 | <---- bytes in memory
+------+------+------+------+
When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300 -- decimal 1633837824.
Of course treating a char* as an int* is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:
union {
char str[4];
unsigned int ui;
} u;
strcpy(u.str, "abc");
printf("%u\n", u.ui);
Neither of the first two is correct.
The first violates aliasing rules and may fail because the address of str is not properly aligned for an unsigned int. To reinterpret the bytes of a string as an unsigned int with the host system byte order, you may copy it with memcpy:
unsigned int a; memcpy(&a, &str, sizeof a);
(Presuming the size of an unsigned int and the size of str are the same.)
The second may fail with integer overflow because str[0] is promoted to an int, so str[0]<<24 has type int, but the value required by the shift may be larger than is representable in an int. To remedy this, use:
unsigned int b = (unsigned int) str[0] << 24 | …;
This second method interprets the bytes from str in big-endian order, regardless of the order of bytes in an unsigned int in the host system.
unsigned int a = *(unsigned int*)str;
This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.
You said you want to copy byte-by-byte.
That means the the line unsigned int a = *(unsigned int*)str; is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.
It just needs some tweaking:
char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
c[i] = str[i];
}
printf("%d\n", a);
Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:
void
changeEndian32(void * data)
{
uint8_t * cp = (uint8_t *) data;
union
{
uint32_t word;
uint8_t bytes[4];
}temp;
temp.bytes[0] = cp[3];
temp.bytes[1] = cp[2];
temp.bytes[2] = cp[1];
temp.bytes[3] = cp[0];
*((uint32_t *)data) = temp.word;
}
Both are correct in a way:
Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.
Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.
What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read ints from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes ints in native byte order into files, c.f. this question).
If your using CVI (National Instruments) compiler you can use the function Scan to do this:
unsigned int a;
For big endian:
Scan(str,"%1i[b4uzi1o3210]>%i",&a);
For little endian:
Scan(str,"%1i[b4uzi1o0123]>%i",&a);
The o modifier specifies the byte order.
i inside the square brackets indicates where to start in the str array.

Resources