c bit manipulation (endianess) - c

Could someone explain me this code please ? I have received some byte code from an assembler and now I have to use it in my virtual machine. This code is used but I don't know how it works and what it is used for.
static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}

It builds up a 32 bit word from 4 bytes.
For example if the bytes are : 1st: 0x12 , 2nd: 0x34, 3rd: 0x56, 4th: 0x78
Then:
static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 | // -> 0x12000000
(uint32_t)bytes[1] << 16 | // -> 0x00340000
(uint32_t)bytes[2] << 8 | // -> 0x00005600
(uint32_t)bytes[3] << 0 ; // -> 0x00000078
return (int32_t)result; // bitwise oring this result -> 0x12345678
}

This function attempts to combine the four bytes in a uint8_t[4] into a single uint32_t with big-endian byte order, cast the result into a signed int32_t, and return that.
So, if you pass a pointer to the array { 0xAA, 0xBB, 0xCC, 0xDD } to the function, it will combine them into a 32-bit integer with the most significant bytes of the integer coming from the lowest addresses in the array, giving you 0xAABBCCDD or -1430532899.
However, if the array pointed to by the argument bytes is not at least four bytes long, it has undefined behavior.

Related

Dereference 4 bytes pointer to 1 byte array

I have a program that looks like this
#include <stdio.h>
#include <stdint.h>
int main()
{
uint8_t arr[10] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09};
uint32_t *p = (uint32_t *)&arr[4];
printf("Value of p: %02Xh\n", *p);
return 0;
}
The output being printed is 7060504h which is
*p = (arr[7] << 24) | (arr[6] << 16) | (arr[5] << 8) | arr[4]; // (1)
I have the following questions:
Why is arr[7] MSB but not arr[4]?
Why is the output calculated following (1) not (2)?
*p = (arr[1] << 24) | (arr[2] << 16) | (arr[3] << 8) | arr[4]; // (2)
Why is arr[7] MSB but not arr[4]?
The C standard says, in C 2018 6.2.6.1: “Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number,
order, and encoding of which are either explicitly specified or implementation-defined.”
This means a C implementation may put the bytes of an object in any order it chooses and documents. The implementation you are using puts the low-value byte at the low address and the big-value byte at the big address. This is called little endian. (The opposite order is big endian.)
Why is the output calculated following (1) not (2)?
This question is unclear. Why would you expect an expression using elements 4, 5, 6, and 7 of the array to produce a value from elements 1, 2, 3, and 4?
uint32_t *p = (uint32_t *)&arr[4];
The behavior of using *p defined this way is not defined by the C standard. It results in an array of uint8_t being accessed as if it were a uint32_t. C 2018 6.5 7 says that is not defined. There can also be errors because the alignment requirement of a uint32_t may be stricter than the requirement of a uint8_t.
A defined way to set the bytes of uint32_t is:
uint32_t p;
memcpy(&p, arr, sizeof p); // Include <string.h> to declare memcpy.
Then, since this p is not a pointer, print it with printf("Value of p: %08Xh\n", p);.

Problem of converting byte order for unsigned 64-bit number in C

I am playing with little endian/big endian conversion and found something that a is a bit confusing but also interesting.
In first example, there is no problem using bit shift to convert byte order for type of uint32_t. It basically cast a uint32_t integer to an array of uint8_t and try to access each byte and bit shift.
Example #1:
uint32_t htonl(uint32_t x)
{
uint8_t *s = (uint8_t*)&x;
return (uint32_t)(s[0] << 24 | s[1] << 16 | s[2] << 8 | s[3]);
}
However, if I try to do something similar on a uint64_t below, the compiler throws a warning about 's[0] width is less than 56 bits` as in Example #2 below.
Example #2:
uint64_t htonl(uint64_t x)
{
uint8_t *s = (uint8_t*)&x;
return (uint64_t)(s[0] << 56 ......);
}
To make it work, I have to fetch each byte into a uint64_t so I can do bit shift without any errors as in Example #3 below.
Example #3:
uint64_t htonll2(uint64_t x)
{
uint64_t byte1 = x & 0xff00000000000000;
uint64_t byte2 = x & 0x00ff000000000000;
uint64_t byte3 = x & 0x0000ff0000000000;
uint64_t byte4 = x & 0x000000ff00000000;
uint64_t byte5 = x & 0x00000000ff000000;
uint64_t byte6 = x & 0x0000000000ff0000;
uint64_t byte7 = x & 0x000000000000ff00;
uint64_t byte8 = x & 0x00000000000000ff;
return (uint64_t)(byte1 >> 56 | byte2 >> 40 | byte3 >> 24 | byte4 >> 8 |
byte5 << 8 | byte6 << 24 | byte7 << 40 | byte8 << 56);
}
I am a little bit confused by Example #1 and Example #2, as far as I understand, both s[i] is of uint8_t size, but somehow if it only shift 32 bits or less there is no problem at all, but there is an issue when shifting like 56 bits. I am running this program on Ubuntu with GCC 8.3.0.
Does the compiler implicitly convert s[i] into 32-bit numbers in this case? sizeof(s[0]) is 1 when I added debug messages to that.
Values with a type smaller than int are promoted to int when used in an expression. Assuming an int is 32 bit on your platform this works in most cases when converting a 32 bit value. The time it won't work is if you shift a 1 bit into the sign bit.
In the 64 bit case this means you're attempting to shift a value more than its bit length which is undefined behavior.
You need to cast each byte to a uint64_t in both cases to allows the shifts to work properly.
The s[0] expression has an 8-bit wide integral type, which is promoted to a 32-bit unsigned integer when operated on by the shift operator – so s[0] << 24 in the first example works OK, as the shift by 24 does not exceed the uint length.
OTOH the shift by 56 bits moves data outside the result's length as the offset exceeds the length of integer, so it certainly causes a loss of information, hence the warning.

Best way to move 8 bits into 8 individual bytes [duplicate]

This question already has answers here:
How to create a byte out of 8 bool values (and vice versa)?
(8 answers)
Closed 3 years ago.
I have a status register with 8 bits. I would like to move each individual bit to a byte for further processing. Seems like it should be easy but every solution I come up with is convoluted. I was thinking about iterating through the bits with a for next loop and dumping them into an array but my solution way too messy.
Here's basically what you're trying to do. It uses bitwise operators and a uint8_t array to make each bit an individual byte:
void bits_to_bytes(uint8_t status, uint8_t bits[8])
{
int ctr;
for( ctr = 0; ctr < 8; ctr++ )
{
bits[ctr] = (status >> ctr) & 1;
}
}
OK, so a little more in-depth:
This code loops through the bits in a byte and then assigns bits[bit_number] to the bit_numberth bit of status.
If you want to reverse the order the bits are stored in, simply change bits[ctr] to bits[(8-1)-ctr].
For a start, you should be using uint8_t for eight-bit bit collections since char is fundamentally non-portable unless you add a lot of extra code for checking its size and signedness.
Something like this should suffice for your needs:
void BitsToBytes(uint8_t bits, uint8_t *bytes) {
for (int i = 0; i < 8; ++i) { // The type has exactly eight bits.
*bytes++ = (bits > 127); // 1 if high bit set, else 0.
bits = (bits & 0x7f) << 1; // Shift left to get next bit.
}
}
:
// Call with:
uint8_t inputBits = 0x42;
uint8_t outputBytes[8];
BitsToBytes(inputBits, outputBytes);
This takes a type with eight bits and a buffer of eight bytes, then places the individual bits into each byte of the array:
MSB LSB
+--------+
inputBits: |abcdefgh|
+--------+
+---+---+---+---+---+---+---+---+
outputBytes: | a | b | c | d | e | f | g | h |
+---+---+---+---+---+---+---+---+
If you want it to go the other way (where the LSB of the input is in element 0 of the array), you can simply change the body of the loop to:
*bytes++ = bits & 1; // 1 if low bit set, else 0.
bits = bits >> 1; // Shift right to get next bit.
You can use a double invocation of the ! operator to squash a zero/non-zero value to zero/one. Using this, the extracted value of bit n in status is !!(status & (1 << n)).
If you only have eight flags you might just create constants for the 8 values of 1 << n (0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80). This works particularly well if your flags all have individual names rather than numbers, so that you might have in a header file somewhere:
#define FLAG_FROB 0x01
#define FLAG_FOO 0x02
#define FLAG_BAR 0x04
#define FLAG_BAZ 0x08
#define FLAG_QUUX 0x10
Then in the code you'd just extract them as
flag_frob = !!(status & FLAG_FROB);
flag_foo = !!(status & FLAG_FOO);
flag_bar = !!(status & FLAG_BAR);
flag_baz = !!(status & FLAG_BAZ);
flag_quux = !!(status & FLAG_QUUX);

ADC raw data forming

I would like to ask you for an explanation about this part of my code. I am not sure what it really does. This is example code and I would like to understand it. The purpose of the original code should be acquiring the data from ADC in the streaming mode. This should be about forming the raw data. Thank you.
#define CH_DATA_SIZE 6
uint8_t read_buf[CH_DATA_SIZE];
uint32_t adc_data;
TI_ADS1293_SPIStreamReadReg(read_buf, count);
adc_data = ((uint32_t) read_buf[0] << 16) | ((uint16_t) read_buf[1] << 8)
| read_buf[2];
I will skip the variable declaration, because I will refer to it in the rest of the description.
The code begins at this line:
TI_ADS1293_SPIStreamReadReg(read_buf, count);
From a Google search, I assume you have this function from this file. If it is this function, it will read three register from this module (see 8.6 Register Maps, the data registers DATA_CHx_ECG are three bytes long, which is what should be in the count variable).
Once this function executed, you have the ECG data in the first three bytes of the read_buf variable, but you want a 24-bit value since the quantified value is a 24-bit value.
Since we don't have uint24_t in C (and no other language I know of), we take the next possible size, which is uint32_t to declare the adc_data variable.
Now the following code does rebuild a single 24-bit value from the 3 bytes we read from the ADC:
adc_data = ((uint32_t) read_buf[0] << 16) | ((uint16_t) read_buf[1] << 8)
| read_buf[2];
From the datasheet and the TI_ADS1293_SPIStreamReadReg, we know that the function does read the values in the order their addresses come, in this case high-byte, middle-byte and low-byte in this order (respectively in read_but[0], read_buf[1] and read_buf[2]).
To rebuild the 24-bit value, the code shifts the value with the appropriate offset: read_buf[0] goes from bits 23 to 16 thus shifted 16 bits, read_buf[1] from bits 15 to 8 thus shifted 8 bits and read_buf[2] from 7 to 0 thus shifted 0 bits (this shift is not represented). We will represent them as such (the 0xAA, 0xBB and 0xCC are example values to show what happens):
read_buf[0] = 0xAA => read_buf[0] << 16 = 0xAA0000
read_buf[1] = 0xBB => read_buf[0] << 8 = 0x00BB00
read_buf[2] = 0xCC => read_buf[0] << 0 = 0x0000CC
To combine the three shifted values, the code uses a bitwise-or | which results in this:
0xAA0000 | 0x00BB00 | 0x0000CC = 0xAABBCC
And you now have a 24-bit value of you ADC reading.

Split up two byte char into two single byte chars

I have a char of value say 0xB3, and I need to split this into two separate char's. So X = 0xB and Y = 0x3. I've tried the following code:
int main ()
{
char addr = 0xB3;
char *p = &addr;
printf ("%c, %c\n", p[0], p[1]); //This prints ?, Y
printf ("%X, %X\n", p[0], p[1]); //This prints FFFFFFB3, 59
return 0;
}
Just to clarify, I need to take any 2 byte char of value 00 to FF and split the first and second byte into separate char's. Thanks.
Straight from the Wikipedia:
#define HI_NIBBLE(b) (((b) >> 4) & 0x0F)
#define LO_NIBBLE(b) ((b) & 0x0F)
So HI_NIBBLE(addr) would be 0xB. However, 0x00 through 0xFF are not "double bytes". They're single-byte values. A single hex digit can take on 16 bytes, while a byte can take on 256 = 16² of them, so you need two hex digits to represent arbitrary byte values.
There's quite a few problems here, let's take a look at your code:
int main ()
{
char addr = 0xB3; <-- you're asigning 0xB3 in hex, which is (179 in dec) to addr
char *p = &addr; <-- you're assigning a pointer to point to addr
If addr were unsigned, it would now be set to 179, the extended ASCII of │ ( Box drawing character )
A char value can be -127 to +127 if it's signed, or 0 to 255 if it's unsigned. Here (according to your output) it's signed, so you're overflowing the char with that assignment.
printf ("%c, %c\n", p[0], p[1]); <-- print the char value of what p is pointing to
also, do some UB
printf ("%X, %X\n", p[0], p[1]); <-- print the hex value of what p is pointing to
also, do some UB
So the second part of your code here prints the char value of your overflowed addr var, which happens to print '?' for you. The hex value of addr is FFFFFFB3 indicating you have a negitive value (upper most bit is the signed bit).
This: p[0] is really an "add and deference" operator. Meaning that we're going to take the address of p, add 0 to it, then deference and look at the result:
p ---------+
V
------------------------------------------
| ptr(0xB3) | ? | ? | ... |
-------------------------------------------
0xbfd56c2b 0xbfd56c2C 0xbfd56c2d ...
When you do p[1] this goes one char or one byte past ptr and gives you that result. What's there? Don't know. That's out of your scope:
p+1 -------------------+
V
------------------------------------------
| ptr(0xB3) | ? | ? | ... |
-------------------------------------------
0xbfd56c2b 0xbfd56c2C 0xbfd56c2d ...
Y's ASCII value (in hex) is 0x59, so behind your pointer in memory was a Y. But it could have been anything, what is was going to do was undefined. A correct way to do this would be:
int main ()
{
unsigned char addr = 0xB3;
char low = addr & 0x0F;
char high = (addr >> 4) & 0x0F;
printf("%#x becomes %#x and %#x\n", addr, high, low);
return 0;
}
This works via:
0xB3 => 1011 0011 0xB3 >> 4 = 0000 1011
& 0000 1111 & 0000 1111
------------ -------------
0000 0011 => 3 low 0000 1011 => B high
Why do you need to pass by a pointer? just take the 4 relevant bits and shift the most significative when needed:
char lower = value & 0x0F;
char higher = (value >> 4) & 0x0F;
Then 0xB3 is a single byte, not two bytes. Since a hex digit can have 16 values two digits can store 16*16 = 256 values, which is how much you can store in a byte.
Ok so your trying to split 0xB3 into 0xB and 0x3, Just for future reference don't say 'byte chars', The 2 parts of a byte are commonly known as 'Nibbles', a byte is made up of 2 nibbles (which are made up of 4 bits).
If you didn't know, char refers to 1 byte.
So heres the problems with your code:
char addr = 0xB3; <---- Creates single byte with value 0xB3 - Good
char *p = &addr; <---- Creates pointer pointing to 0xB3 - Good
printf ("%c, %c\n", p[0], p[1]); <---- p[0], p[1] - Bad
printf ("%X, %X\n", p[0], p[1]); <---- p[0], p[1] - Bad
Ok so when your referring to p[0] and p[1] your telling your system that the pointer p is pointing to an array of chars (p[0] would refer to 0xB3 but p[1] would be going to the next byte in memory)
Example : This is something your system memory would look like (But with 8 byte pointers)
Integer Values Area Pointers Area
0x01 0x02 0x03 0x04 0x05 0x06 0x12 0x13 0x14 0x15 0x16
----------------------------- ------------------------
.... .... 0xB3 0x59 .... .... .... .... 0x03 .... ....
----------------------------- ------------------------
^ ^ ^
addr | p (example pointer pointing to example address 0x03)
Random number (Pointers are normally 8 Bytes but)
showing up in p[1] (But In this example I used single bytes)
So when you tell your system to get p[0] or *p (these would do the same thing)
it will go to the address (eg. 0x03) and get one byte (because its a char)
in this case 0xB3.
But when you try p[1] or *(p+1) That will go to the address (eg. 0x03) skip the first char and get the next one giving us 0x59 which would be there for some other variable.
Ok so we've got that out of the way so how do you get the nibbles?
A problem with getting the nibble is that you generally can't just have half a byte a put variable, theres no type that supports just 4 bits.
When you print with %x/%X it will only show the nibbles up to the last non-zero number eg. = 0x00230242 would only show 230242 but if you do something like
%2lX would show 2 full bytes (including the zeros)
%4lX would show 4 full bytes (including the zeros)
So it is pretty pointless trying to get individual nibbles but if you want way to do something like then do:
char addr = 0x3B;
char addr1 = ((addr >> 4) & 0x0F);
char addr2 = ((addr >> 0) & 0x0F);

Resources