Cast char subarray to integer - c

unsigned int b;
unsigned char a[] =
{0x00,0x00,0x00,0x12,0x00,0x00,0x81,0x03,0x00,0x00,0x00,0x00,0x01,0x91,0x01,0x01,0xb1,0x04,0x47,0x86,0x8f,0xf8,0x00};
I'm a newbie in C programming,
I need to take the 4 bytes subarray start at a[18] which is 0x47868ff8,
and cast it into corresponding decimal integer:1200001016.
I try to use memcpy(&b,a+18, 4), but it does not seem to work,
Could anyone give me some hints to work out this function?
And if I want to read a char pointer message then cast per 4 bytes in order into integer array,
what is the best way to do that? Thanks.

Copying like that has implementation-defined behavior, and you'll get different results depending on the endianness of the CPU.
To do it portably you can use bitwise operations.
b = (unsigned int)a[18] << 24 | (unsigned int)a[19] << 16 | (unsigned int)a[20] << 8 | a[21];

Related

Combined 3 unsigned chars into an unsigned int

I have a requirement to have an unsigned 3 byte type in C. I am looking for a way to pack them into a single unsigned int.
Is this safe or does this need to be stored inside an array/structure for the 24 bit size?
unsigned int pack_3Byte(unsigned char b1, unsigned char b2, unsigned char b3)
{
return (b1 << 16) | (b2 << 8) | (b3);
}
Your code is correct but like Olaf says you should use the types uint8_t and uint32_t to ensure that your types are really the width you expect them to be.
This may not be a problem right now, but you should also be aware that the bytes in an integer are stored in different order on different processors. This is called endianness.

Extract 2nd and 3rd value from char array and cast it as short (2 bytes). In C

Say I have an unsigned char (or byte) array. I want to take array[1] and array[2] from memory and cast it as short int (2 bytes). Something similar to how a union works, but not starting from the first byte.
Example:
#include <stdio.h>
void main()
{
unsigned char a[28];
unsigned short t;
a[0]=12;
a[1]=10;
a[2]=55;
t=(short) *(a+1);
printf("%i", t);
}
What I want is the value 14090 in decimal. Or 370Ah.
Thank you.
EDIT: I forgot to say, but most of you understood from my example, I am working on a little-endian machine. An 8bit Atmel microcontroller.
It's very simple:
unsigned short t = (a[2] << 8) | a[1];
Note, this assumes unsigned char is 8 bits, which is most likely the case.
The memory access operation (short)*(a+1) is not safe.
If a+1 is not aligned to short (i.e., a+1 is not a multiple of sizeof short), then the result of this operation depends on the compiler at hand.
Compilers that support unaligned load/store operations can resolve it correctly, while others will "round it down" to the nearest address which is aligned to short.
In general, this operations yields undefined behavior.
On top of all that, even if you know for sure that a+1 is aligned to short, this operation will still give you different results between Big-Endian architecture and Little-Endian architecture.
Here is a safe way to work-around both issues:
short x = 0x1234;
switch (*(char*)&x)
{
case 0x12: // Big-Endian
t = (a[1] << 8) | a[2]; // Simulate t = (short)*(a+1) on BE
break;
case 0x34: // Little-Endian
t = (a[2] << 8) | a[1]; // Simulate t = (short)*(a+1) on LE
break;
}
Please note that the code above assumes the following:
CHAR_BIT == 8
sizeof short == 2
This is not necessarily true on every platform (although it is mostly the case).
t= *(short *)(a+1);
You cast the pointer to the first element to a pointer-to-short, and then dereference it.
Note that this is not very portable, and can go wrong if the machine is big endian or aligns data somehow. A better way would be:
t = (a[2] << CHAR_BIT) | a[1];
For full portability, you should check your endianness and see which byte to shift, and which one not to. See here how to check a machine's endianness

Copying a 4 element character array into an integer in C

A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.
char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);
Output is
6513249 1633837824 6513249
Which one is correct? What is going wrong?
It's an endianness issue. When you interpret the char* as an int* the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.
To put this into pictures, this is the source array:
a b c \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 | <---- bytes in memory
+------+------+------+------+
When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300 -- decimal 1633837824.
Of course treating a char* as an int* is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:
union {
char str[4];
unsigned int ui;
} u;
strcpy(u.str, "abc");
printf("%u\n", u.ui);
Neither of the first two is correct.
The first violates aliasing rules and may fail because the address of str is not properly aligned for an unsigned int. To reinterpret the bytes of a string as an unsigned int with the host system byte order, you may copy it with memcpy:
unsigned int a; memcpy(&a, &str, sizeof a);
(Presuming the size of an unsigned int and the size of str are the same.)
The second may fail with integer overflow because str[0] is promoted to an int, so str[0]<<24 has type int, but the value required by the shift may be larger than is representable in an int. To remedy this, use:
unsigned int b = (unsigned int) str[0] << 24 | …;
This second method interprets the bytes from str in big-endian order, regardless of the order of bytes in an unsigned int in the host system.
unsigned int a = *(unsigned int*)str;
This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.
You said you want to copy byte-by-byte.
That means the the line unsigned int a = *(unsigned int*)str; is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.
It just needs some tweaking:
char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
c[i] = str[i];
}
printf("%d\n", a);
Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:
void
changeEndian32(void * data)
{
uint8_t * cp = (uint8_t *) data;
union
{
uint32_t word;
uint8_t bytes[4];
}temp;
temp.bytes[0] = cp[3];
temp.bytes[1] = cp[2];
temp.bytes[2] = cp[1];
temp.bytes[3] = cp[0];
*((uint32_t *)data) = temp.word;
}
Both are correct in a way:
Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.
Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.
What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read ints from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes ints in native byte order into files, c.f. this question).
If your using CVI (National Instruments) compiler you can use the function Scan to do this:
unsigned int a;
For big endian:
Scan(str,"%1i[b4uzi1o3210]>%i",&a);
For little endian:
Scan(str,"%1i[b4uzi1o0123]>%i",&a);
The o modifier specifies the byte order.
i inside the square brackets indicates where to start in the str array.

Casting a short from a char array

I've run into a small issue here. I have an unsigned char array, and I am trying to access bytes 2-3 (0xFF and 0xFF) and get their value as a short.
Code:
unsigned char Temp[512] = {0x00,0xFF,0xFF,0x00};
short val = (short)*((unsigned char*)Temp+1)
While I would expect val to contain 0xFFFF it actually contains 0x00FF. What am I doing wrong?
There's no guarantee that you can access a short when the data is improperly aligned.
On some machines, especially RISC machines, you'd get a bus error and core dump for misaligned access. On other machines, the misaligned access would involve a trap into the kernel to fix up the error — which is only a little quicker than the core dump.
To get the result reliably, you'd be best off doing shifting and or:
val = *(Temp+1) << 8 | *(Temp+2);
or:
val = *(Temp+2) << 8 | *(Temp+1);
Note that this explicitly offers big-endian (first option) or little-endian (second) interpretation of the data.
Also note the careful use of << and |; if you use + instead of |, you have to parenthesize the shift expression or use multiplication instead of shift:
val = (*(Temp+1) << 8) + *(Temp+2);
val = *(Temp+1) * 256 + *(Temp+2);
Be logical and use either logic or arithmetic and not a mixture.
Well you're dereferencing a unsigned char* when you should be derefencing a short*
I think this should work:
short val = *((short*)(Temp+1))
Your problem is that you are only accessing one byte of the array:
*((unsigned char*)Temp+1) will dereference the pointer Temp+1 giving you 0xFF
(short)*((unsigned char*)Temp+1) will cast the result of the dereference to short. Casting unsigned char 0xFF to short obviously gives you 0x00FF
So what you are trying to do is *((short*)(Temp+1))
It should however be noted that what you are doing is a horrible hack. First of all when you have different chars the result will obviously depend on the endianess of the machine.
Second there is no guarantee that the accessed data is correctly aligned to be accessed as a short.
So it might be a better idea to do something like short val= *(Temp+1)<<8 | *(Temp+2) or short val= *(Temp+2)<<8 | *(Temp+1) depending on the endianess of your architecture
I do not recommend this approach because it is architecture-specific.
Consider the following definition of Temp:
unsigned char Temp[512] = {0x00,0xFF,0x88,0x00};
Depending on the endianness of the system, you will get different results casting Temp + 1 to a short *; on a little endian system, the result would be the value 0x88FF, but on a Big endian system, the result would be 0xFF88.
Also, I believe that this is an undefined cast because of issues with alignment.
What you could use is:
short val = (((short)Temp[1]) << 8) | Temp[2];

How can I cast a char to an unsigned int?

I have a char array that is really used as a byte array and not for storing text. In the array, there are two specific bytes that represent a numeric value that I need to store into an unsigned int value. The code below explains the setup.
char* bytes = bytes[2];
bytes[0] = 0x0C; // For the sake of this example, I'm
bytes[1] = 0x88; // assigning random values to the char array.
unsigned int val = ???; // This needs to be the actual numeric
// value of the two bytes in the char array.
// In other words, the value should equal 0x0C88;
I can not figure out how to do this. I would assume it would involve some casting and recasting of the pointers, but I can not get this to work. How can I accomplish my end goal?
UPDATE
Thank you Martin B for the quick response, however this doesn't work. Specifically, in my case the two bytes are 0x00 and 0xbc. Obviously what I want is 0x000000bc. But what I'm getting in my unsigned int is 0xffffffbc.
The code that was posted by Martin was my actual, original code and works fine so long as all of the bytes are less than 128 (.i.e. positive signed char values.)
unsigned int val = (unsigned char)bytes[0] << CHAR_BIT | (unsigned char)bytes[1];
This if sizeof(unsigned int) >= 2 * sizeof(unsigned char) (not something guaranteed by the C standard)
Now... The interesting things here is surely the order of operators (in many years still I can remember only +, -, * and /... Shame on me :-), so I always put as many brackets I can). [] is king. Second is the (cast). Third is the << and fourth is the | (if you use the + instead of the |, remember that + is more importan than << so you'll need brakets)
We don't need to upcast to (unsigned integer) the two (unsigned char) because there is the integral promotion that will do it for us for one, and for the other it should be an automatic Arithmetic Conversion.
I'll add that if you want less headaches:
unsigned int val = (unsigned char)bytes[0] << CHAR_BIT;
val |= (unsigned char)bytes[1];
unsigned int val = (unsigned char) bytes[0]<<8 | (unsigned char) bytes[1];
The byte ordering depends on the endianness of your processor. You can do this, which will work on big or little endian machines. (without ntohs it will work on big-endian):
unsigned int val = ntohs(*(uint16_t*)bytes)
unsigned int val = bytes[0] << 8 + bytes[1];
I think this is a better way to go about it than relying on pointer aliasing:
union {unsigned asInt; char asChars[2];} conversion;
conversion.asInt = 0;
conversion.asChars[0] = 0x0C;
conversion.asChars[1] = 0x88;
unsigned val = conversion.asInt;

Resources