C int memory storage. Least signinficant vs most significant bits? - c

I'd expect the following combination of two uint8_t (0x00 and 0x01) into one uint16_t to give me a value of 0x0001, when I combine them consecutively in memory. Instead I obtain 0x0100 = 256, which I'm surprised of.
#include <stdio.h>
#include <stdint.h>
int main(void){
uint8_t u1 = 0x00, u2 = 0x01;
uint8_t ut[2] = {u1, u2};
uint16_t *mem16 = (uint16_t*) ut;
printf("mem16 = %d\n", *mem16);
return 0;
}
Could anyone explain me what I've missed in my current understanding of C memory?
Thank you! :-)

It is called endianess.
Most system nowadays use little endian. In this system first is stored the least significant byte. So the 0x0100 is stored (assuming 2 bytes representation) as {0x00, 0x01} exactly as in your case

ut[0] is inserted on LSB of mem16 , and ut[1] on MSB.

Related

bit programing in C [duplicate]

This question already has answers here:
How do I split up a long value (32 bits) into four char variables (8bits) using C?
(6 answers)
Closed 8 months ago.
I am new to bits programming in C and finding it difficult to understand how ipv4_to_bit_string() in below code works.
Can anyone explain that, what is happening when I pass integer 1234 to this function. Why integer is right shifted at 24,16,8 and 4 places?
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
typedef struct BIT_STRING_s {
uint8_t *buf; /* BIT STRING body */
size_t size; /* Size of the above buffer */
int bits_unused; /* Unused trailing bits in the last octet (0..7) */
} BIT_STRING_t;
BIT_STRING_t tnlAddress;
void ipv4_to_bit_string(int i, BIT_STRING_t *p)
{
do {
(p)->buf = calloc(4, sizeof(uint8_t));
(p)->buf[0] = (i) >> 24 & 0xFF;
(p)->buf[1] = (i) >> 16 & 0xFF;
(p)->buf[2] = (i) >> 8 & 0xFF;
(p)->buf[3] = (i) >> 4 & 0xFF;
(p)->size = 4;
(p)->bits_unused = 0;
} while(0);
}
int main()
{
BIT_STRING_t *p = (BIT_STRING_t*)calloc(1, sizeof(BIT_STRING_t));
ipv4_to_bit_string(1234, p);
}
An IPv4 address is four eight-bit pieces that have been put together into one 32-bit piece. To take the 32-bit piece apart into the four eight-bit pieces, you extract each eight bits separately. To extract one eight-bit piece, you shift right by 0, 8, 16, or 24 bits, according to which piece you want at the moment, and then mask with 0xFF to take only the low eight bits after the shift.
The shift by 4 instead of 0 appears to be an error.
The use of an int for the 32-bit piece appears to be an error, primarily because the high bit may be set, which indicates the int value is negative, and then the right-shift is not fully defined by the C standard; it is implementation-defined. An unsigned type should be used. Additionally, int is not necessarily 32 bits; it is preferable to use uint32_t, which is defined in the <stdint.h> header.

What's the easiest way to convert from binary file to a C integer?

I have a binary file which contains 4 bytes representing an integer number. I want to read this value and store it into an int variable in C. Is it possible?
The number that I have is 176 which is written like this ...° into the binary file, and if I take the hexadecimal it is 0x00 0x00 0x00 0xb0 and that is fine.
But I need to get the 176 into an int variable in C. I hope that I made myself clear.
Thank you!
UPDATE:
I read the file into a uint8_t array.
So now I have uint8_t* my_value containing the 4 bytes that I read.
If I write
int hvalue = my_value[3];
I get 176, and it works fine but it looks like it's not a good thing to do (because this is a lucky case where 176 stays in the 1 byte, but if I have a bigger number that needs 2 or 3 bytes this is not going to work anymore)..
I'm sorry but I have troubles explaining myself...
Yes, that's a little hacky because it's not going to work for a value like
0x08, 0x09, 0x0a, 0x0b
Your problem is basically that of big-endian to native-endian (little or big) conversion.
Under POSIX, the ntohl (Network To Host, Long) function from <arpa/inet.h> handles that for 32-bit numbers.
A solution based on ntohl would be to memcpy/read the bytes into an uint32_t and then apply ntohl to it:
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>
int32_t getint32(unsigned char *bytes)
{
int32_t r;
memcpy(&r,bytes,sizeof(r));
return (int32_t)ntohl(r);
}
In terms of something more basic, you'd read into an uint32_t/int32_t and then reverse the bytes iff (and only if) your machine's endianness wasn't big-endian (this is what the ntoh* functions (typically inline functions) do + they'll typically do the byte reversion (if it's needed) with a bswap intrinsic).
You can also compose the number manaually from the octets:
#include <stdint.h>
int32_t getint32_(unsigned char *bytes)
{
uint32_t r =
((uint32_t)bytes[0]<<24)|
((uint32_t)bytes[1]<<16)|
((uint32_t)bytes[2]<<8)|
((uint32_t)bytes[3]<<0);
return (int32_t)r;
}
Gcc >= 5 and clang >= 5 (but not older versions) are capable of pattern matching the above into a bswap instruction, so you don't really need htonl (or and explicit __builtin_bswap) to get compact/efficent endian-converting code on those (https://gcc.godbolt.org/z/wBYzuS).
Basically you just need to do this:
#include <stdint.h>
...
int32_t thevalue; // declare a 32 bit int, int may have a different size from 4
// on your platform
FILE *file = fopen("thefile", "rb"); // open file in binary mode
if (file != NULL)
{
fread(&theValue, sizeof(thevalue), 1, file);
fclose(file);
}
You might need to handle endianness though.

Is possible cast from a preprocessor to array?

I am trying to cast a preprocesor to an array, But I am not sure if it is possible at all,
Where for example I have defined:
Number 0x44332211
Code below:
#include <stdio.h>
#include <stdint.h>
#define number 0x44332211
int main()
{
uint8_t array[4] = {(uint8_t)number, (uint8_t)number << 8,(uint8_t)(number <<16 ),(uint8_t)(number <<24)};
printf("array[%x] \n\r",array[0]); // 0x44
printf("array[%x] \n\r",array[1]); // 0x33
printf("array[%x] \n\r",array[2]); // 0x22
printf("array[%x] \n\r",array[3]); // 0x11
return 0;
}
and I want to cast it two an uint8_t array[4] where array[0] = 0x44, array[1] = 0x33, array[2] = 0x22, array[3] = 0x11
Is it possible?
my output:
array[11]
array[0]
array[0]
array[0]
A couple of realizations are needed:
uint8_t masks out the least significant byte of the data. Meaning you have to right shift data down into the least significant byte, not left shift data away from it.
0x44332211 is an integer constant, not a "preprocessor". It is of type int and therefore signed. You shouldn't use bitwise operators on signed types. Easily solved by changing to 0x44332211u with unsigned suffix.
Typo here: (uint8_t)number << 8. You should shift then cast. Casts have higher precedence than shift.
#include <stdio.h>
#include <stdint.h>
#define number 0x44332211u
int main()
{
uint8_t array[4] =
{
(uint8_t)(number >> 24),
(uint8_t)(number >> 16),
(uint8_t)(number >> 8),
(uint8_t) number
};
printf("array[%x] \n\r",array[0]); // 0x44
printf("array[%x] \n\r",array[1]); // 0x33
printf("array[%x] \n\r",array[2]); // 0x22
printf("array[%x] \n\r",array[3]); // 0x11
return 0;
}
This is not really a cast in any way. You have defined a constant and compute the values of the array based on that constant. Keep in mind that in this case, the preprocessor simply does a search and replace, nothing clever.
Also, your shift is in the wrong direction. You keep the last (rightmost) 8 bits when casting int to uint8_t, not the first (leftmost) ones.
Yes, you are casting an int to a uint8_t. The only problem is that, when you make the shifts, the result won't fit in the type you are casting to and information will be lost.
Your uint8_t casts are just taking the least significant byte. that's why you get 11 in the first case and 0 in the others... because your shifts to the left leave 0 in the rightmost positions.

Casting uint8_t array into uint16_t value in C

I'm trying to convert a 2-byte array into a single 16-bit value. For some reason, when I cast the array as a 16-bit pointer and then dereference it, the byte ordering of the value gets swapped.
For example,
#include <stdint.h>
#include <stdio.h>
main()
{
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
printf("%x\n", (unsigned int)b);
return 0;
}
prints aa15 instead of 15aa (which is what I would expect).
What's the reason behind this, and is there an easy fix?
I'm aware that I can do something like uint16_t b = a[0] << 8 | a[1]; (which does work just fine), but I feel like this problem should be easily solvable with casting and I'm not sure what's causing the issue here.
As mentioned in the comments, this is due to endianness.
Your machine is little-endian, which (among other things) means that multi-byte integer values have the least significant byte first.
If you compiled and ran this code on a big-endian machine (ex. a Sun), you would get the result you expect.
Since your array is set up as big-endian, which also happens to be network byte order, you could get around this by using ntohs and htons. These functions convert a 16-bit value from network byte order (big endian) to the host's byte order and vice versa:
uint16_t b = ntohs(*(uint16_t*)a);
There are similar functions called ntohl and htonl that work on 32-bit values.
This is because of the endianess of your machine.
In order to make your code independent of the machine consider the following function:
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int endian() {
int i = 1;
char *p = (char *)&i;
if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}
So for each case you can choose which operation to apply.
You cannot do anything like *(uint16_t*)a because of the strict aliasing rule. Even if code appears to work for now, it may break later in a different compiler version.
A correct version of the code could be:
b = ((uint16_t)a[0] << CHAR_BIT) + a[1];
The version suggested in your question involving a[0] << 8 is incorrect because on a system with 16-bit int, this may cause signed integer overflow: a[0] promotes to int, and << 8 means * 256.
This might help to visualize things. When you create the array you have two bytes in order. When you print it you get the human readable hex value which is the opposite of the little endian way it was stored. The value 1 in little endian as a uint16_t type is stored as follows where a0 is a lower address than a1...
a0 a1
|10000000|00000000
Note, the least significant byte is first, but when we print the value in hex it the least significant byte appears on the right which is what we normally expect on any machine.
This program prints a little endian and big endian 1 in binary starting from least significant byte...
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>
void print_bin(uint64_t num, size_t bytes) {
int i = 0;
for(i = bytes * 8; i > 0; i--) {
(i % 8 == 0) ? printf("|") : 1;
(num & 1) ? printf("1") : printf("0");
num >>= 1;
}
printf("\n");
}
int main(void) {
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
uint16_t le = 1;
uint16_t be = htons(le);
printf("Little Endian 1\n");
print_bin(le, 2);
printf("Big Endian 1 on little endian machine\n");
print_bin(be, 2);
printf("0xaa15 as little endian\n");
print_bin(b, 2);
return 0;
}
This is the output (this is Least significant byte first)
Little Endian 1
|10000000|00000000
Big Endian 1 on little endian machine
|00000000|10000000
0xaa15 as little endian
|10101000|01010101

Copy 6 byte array to long long integer variable

I have read from memory a 6 byte unsigned char array.
The endianess is Big Endian here.
Now I want to assign the value that is stored in the array to an integer variable. I assume this has to be long long since it must contain up to 6 bytes.
At the moment I am assigning it this way:
unsigned char aFoo[6];
long long nBar;
// read values to aFoo[]...
// aFoo[0]: 0x00
// aFoo[1]: 0x00
// aFoo[2]: 0x00
// aFoo[3]: 0x00
// aFoo[4]: 0x26
// aFoo[5]: 0x8e
nBar = (aFoo[0] << 64) + (aFoo[1] << 32) +(aFoo[2] << 24) + (aFoo[3] << 16) + (aFoo[4] << 8) + (aFoo[5]);
A memcpy approach would be neat, but when I do this
memcpy(&nBar, &aFoo, 6);
the 6 bytes are being copied to the long long from the start and thus have padding zeros at the end.
Is there a better way than my assignment with the shifting?
What you want to accomplish is called de-serialisation or de-marshalling.
For values that wide, using a loop is a good idea, unless you really need the max. speed and your compiler does not vectorise loops:
uint8_t array[6];
...
uint64_t value = 0;
uint8_t *p = array;
for ( int i = (sizeof(array) - 1) * 8 ; i >= 0 ; i -= 8 )
value |= (uint64_t)*p++ << i;
// left-align
value <<= 64 - (sizeof(array) * 8);
Note using stdint.h types and sizeof(uint8_t) cannot differ from1`. Only these are guaranteed to have the expected bit-widths. Also use unsigned integers when shifting values. Right shifting certain values is implementation defined, while left shifting invokes undefined behaviour.
Iff you need a signed value, just
int64_t final_value = (int64_t)value;
after the shifting. This is still implementation defined, but all modern implementations (and likely the older) just copy the value without modifications. A modern compiler likely will optimize this, so there is no penalty.
The declarations can be moved, of course. I just put them before where they are used for completeness.
You might try
nBar = 0;
memcpy((unsigned char*)&nBar + 2, aFoo, 6);
No & needed before an array name caz' it's already an address.
The correct way to do what you need is to use an union:
#include <stdio.h>
typedef union {
struct {
char padding[2];
char aFoo[6];
} chars;
long long nBar;
} Combined;
int main ()
{
Combined x;
// reset the content of "x"
x.nBar = 0; // or memset(&x, 0, sizeof(x));
// put values directly in x.chars.aFoo[]...
x.chars.aFoo[0] = 0x00;
x.chars.aFoo[1] = 0x00;
x.chars.aFoo[2] = 0x00;
x.chars.aFoo[3] = 0x00;
x.chars.aFoo[4] = 0x26;
x.chars.aFoo[5] = 0x8e;
printf("nBar: %llx\n", x.nBar);
return 0;
}
The advantage: the code is more clear, there is no need to juggle with bits, shifts, masks etc.
However, you have to be aware that, for speed optimization and hardware reasons, the compiler might squeeze padding bytes into the struct, leading to aFoo not sharing the desired bytes of nBar. This minor disadvantage can be solved by telling the computer to align the members of the union at byte-boundaries (as opposed to the default which is the alignment at word-boundaries, the word being 32-bit or 64-bit, depending on the hardware architecture).
This used to be achieved using a #pragma directive and its exact syntax depends on the compiler you use.
Since C11/C++11, the alignas() specifier became the standard way to specify the alignment of struct/union members (given your compiler already supports it).

Resources