What's more recommended or advisable way to convert array of uint8_t at offset i to uint64_t and why?
uint8_t * bytes = ...
uint64_t const v = ((uint64_t *)(bytes + i))[0];
or
uint64_t const v = ((uint64_t)(bytes[i+7]) << 56)
| ((uint64_t)(bytes[i+6]) << 48)
| ((uint64_t)(bytes[i+5]) << 40)
| ((uint64_t)(bytes[i+4]) << 32)
| ((uint64_t)(bytes[i+3]) << 24)
| ((uint64_t)(bytes[i+2]) << 16)
| ((uint64_t)(bytes[i+1]) << 8)
| ((uint64_t)(bytes[i]));
There are two primary differences.
One, the behavior of ((uint64_t *)(bytes + i))[0] is not defined by the C standard (unless certain prerequisites about what bytes point to are met). Generally, an array of bytes should not be accessed using a uint64_t type.
When memory defined as one type is accessed with another type, it is called aliasing, and the C standard only defines certain combinations of aliasing. Some compilers may support some aliasing beyond what the standard requires, but using it is not portable. Additionally, if bytes + i is not suitably aligned for a uint64_t, the access may cause an exception or otherwise malfunction.
Two, loading the bytes through aliasing, if it is defined (by the standard or by compiler extension), interprets the bytes using the memory ordering for the C implementation. Some C implementations store the bytes representing integers in memory from low address to high address for low-position-value bytes to high-position-value bytes, and some store them from high address to low address. (And they can be stored in non-consecutive orders too, although this is rare.) So loading the bytes this way will produce different values from the same bytes in memory based on what order the C implementation uses.
But loading the bytes and using shifts to combine them will always produce the same value from the same bytes in memory regardless of what order the C implementation uses.
The first method should be avoided, because there is no need for it. If one desires to interpret the bytes using the C implementation’s ordering, this can be done with:
uint64_t t;
memcpy(&t, bytes+i, sizeof t);
const uint64_t v = t;
Using memcpy provides a portable way of aliasing the uint64_t to store bytes into it. Good compilers recognize this idiom and will optimize the memcpy to a load from memory, if suitable for the target architecture (and if optimization is enabled).
If one desires to interpret the bytes using little-endian ordering, as shown in the code in the question, then the second method may be used. (Sometimes platforms will have routines that may provide more efficient code for this.)
You can also use memcpy
uint64_t n;
memcpy(&n, bytes + i, sizeof(uint64_t));
const uint64_t v = n;
The first option has two big problems that qualify as undefined behavior (anything can happen):
A uint8_t* or array of uint8_t is not necessarily aligned the same way as required by a larger type like uint64_t. Simply casting to uint64_t* leads to misaligned access. This can cause hardware exceptions, program crashes, slower code etc, all depending on the alignment requirements of the specific target.
It violates the internal type system of C, where each object in memory known by the compiler has an "effective type" that the compiler keeps track of. Based on this, the compiler is allowed to make certain assumptions regarding if a certain memory region have been accessed or not during optimization. If your code violates these type rules, as it would in this case, wrong machine code could get generated.
This is most commonly referred to as the strict aliasing rule and your cast followed by dereferencing would be a so-called "strict aliasing violation".
The second option is sound code, because:
When doing shifts or other forms of bitwise arithmetic, a large integer type should be used. That is, unsigned int or larger - depending on system. Using signed types or small integer types can lead to undefined behavior or unexpected results. See Implicit type promotion rules regarding problems with small integer types implicitly changing signedness in some expressions.
If not for the cast to uint64_t, then the bytes[i+7] << 56 shift would involve an implicit promotion of the left operand from uint8_t to int, which would be a bug. Because if the most significant bit (MSB) of the byte is set and we shift into/beyond the sign bit, we invoke undefined behavior - again, anything can happen.
And naturally we need to use a 64 bit type in this specific case or otherwise we wouldn't be able to shift as far as 56 bits. Shifting beyond the range of the type of the left operand is also undefined behavior.
Note that whether to pick the order of bytes[i+7] << 56 versus the alternative bytes[i+0] << 56 depends on the underlying CPU endianess. Bit shifts are nice since the actual shift ignores if the destination type is using big or little endian. But in this case you must know in advance which byte in the source array you want to correspond to the most significant. This code you have here will work if the array was built based on little endian formatting, since the last byte of the array is shifted to the highest address.
As for the uint64_t const v = , the const qualifier is a bit strange to have at local scope like that. It's harmless but confusing and doesn't really add anything of value inside a local scope. I would just drop it.
When writing to a GPIO output register (BCM2711 chip on the Raspberry Pi 4), the location of my clear_register function call changes the result completely. The code is below.
void clear_register(long reg) {
*(unsigned int*)reg = (unsigned int)0;
}
int set_pin(unsigned int pin_number) {
long reg = SET_BASE; //0xFE200001c
clear_register(SET_BASE);
unsigned int curval = *(unsigned int*)reg;
unsigned int value = (1 << pin_number);
value |= curval;
gpio_write(reg, value);
return 1;
}
When written like this, the register gets cleared. However, if I call clear_register(SET_BASE) before long reg = SET_BASE, the registers don't get cleared. I can't tell why this would make a difference; any help is appreciated.
I can't reproduce it since I don't have the target hardware which allows writing to that address but you have several bugs and dangerous style issues:
0xF E2 00 00 1c (64 bit signed) does not necessarily fit inside a long (possibly 32 bit) and certainly does not fit inside a unsigned int (16 or 32 bits).
Even if that was a typo and you meant to write 0xFE20001c (32 bit), your program still stuffers from "sloppy typing", which is when the programmer just type out long, int etc without given it any deeper thought. The outcome is strange, subtle and intermittent bugs. I think everyone can relate to the situation "just cast it to unsigned int and it works" but you have no idea why. Bugs like that are always caused by "sloppy typing".
You should be using the stdint.h types instead. uint32_t and so on.
Also related to "sloppy typing", there exist very few cases where you want to actually use signed numbers in embedded systems - certainly not when dealing with addresses and bits. Negative addresses in Linux is a thing - kernel space - but accidentally changing an address to a negative value is a bug.
Signed types will only cause problems when doing bitwise arithmetic and they come with overflows. So you need to nuke the presence of every signed variable in your code unless it explicitly needs to be signed.
This means no sloppy long but the correct type, which is either uint32_t or uintptr_t if it should hold an address, as is the case in your example. It also mean so sloppy hex constants, they must always have u suffix: 0xFE20001Cu or they easily boil down to the wrong type.
For these reasons 1 << is always a bug in a C program, because 1 is of type signed int and you may end up shifting into the sign bit of that signed int, which is undefined behavior. There exists no reason why you shouldn't write 1u << , always.
Whenever you are dealing with hardware registers, you must always use volatile qualified pointers or you might end up with very strange bugs caused by the optimizer. No exceptions here either.
(unsigned int)0 is a pointless cast. If you wish to be explict you could write 0u but that doesn't change anything in case of assignment, since a "lvalue conversion" to the type of the left operand always happens upon assignment. 0u does make some static analysers happy though, MISRA C checkers etc.
With all bug fixes, your clear function should for example look something like this:
void clear_register (uintptr_t reg) {
*(volatile uint32_t*)reg = 0;
}
Or better yet, don't write functions for such very basic things! *(volatile uint32_t*)SET_BASE = 0; is perfectly clear, readable and self-documented code. While clear_register(SET_BASE) suggests that something more advanced than just a simple write is taking place.
However, you could have placed the cast inside the SET_BASE macro. For details and examples see How to access a hardware register from firmware?
I am new to ARM LPC2148 micro controller and also new to StackOverflow. I just saw one piece of code in one of the evaluation boards. I am pasting as it is below.
Port pins P0.19 to P0.22 are mapped to D4 to D7 of LCD. The function below is used to send commands to LCD operated in 4 bit mode:
void LCD_Command(unsigned int data) // This function is used to send LCD commands
{
unsigned int temp=0;
EN_LOW(); // Set EN pin of LCD to to Low
COMMAND_PORT();
WRITE_DATA();
temp=data;
IO0PIN&=0xFF87FFFF;
IO0PIN|=(temp & 0xF0) << 15;
EN_HI(); // Give strobe by enabling and disabling En pin of LCD
EN_LOW();
temp=data & 0x0F;
IO0PIN&=0xFF87FFFF;
IO0PIN|=(temp) << 19;
EN_HI();
EN_LOW();
while(Busy_Wait());
Delay(10);
}
My questions are:
The variable "data" is already 32 bit wide. Is it efficient to shift the data in this way? Coder could have passed 32 bit data and then masked (&)/ORed (|). Or are there any other impacts?
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int? Since registers are 32 bit wide, I am not sure whether internally any segmentation is done to save memory.
Is there any way we can easily map 8 bit data to one of the 8 bit portions of 32 bit data? In the above code, shifting is done by hard coding (<<15 or <<19 etc). Can we avoid this hard coding and use some #defines to map the bits?
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int?
Only when storing them into RAM, which this small function will not do once the optimizer is on. Note that using char types may introduce additional code to be generated to handle overflows correctly.
[...] Can we avoid this hard coding and use some #defines to map the bits?
Easy:
#define LCD_SHIFT_BITS 19
void LCD_Command(unsigned int data) // This function is used to send LCD commands
{
unsigned int temp=0;
EN_LOW(); // Set EN pin of LCD to to Low
COMMAND_PORT();
WRITE_DATA();
temp=data;
IO0CLR = 0x0F << LCD_SHIFT_BITS;
IO0SET = (temp & 0xF0) << (LCD_SHIFT_BITS - 4);
EN_HI(); // Give strobe by enabling and disabling En pin of LCD
EN_LOW();
temp=data & 0x0F;
IO0CLR = 0x0F << LCD_SHIFT_BITS;
IO0SET = temp << LCD_SHIFT_BITS;
EN_HI();
EN_LOW();
while(Busy_Wait());
Delay(10);
}
I also changed pin set and clear to be atomic.
The variable "data" is already 32 bit wide. Is it efficient to shift the data in this way? Coder could have passed 32 bit data and then masked (&)/ORed (|). Or are there any other impacts?
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int? Since registers are 32 bit wide, I am not sure whether internally any segmentation is done to save memory.
Since you are using a 32 bit MCU, reducing the variable sizes will not make the code any faster. It could possibly make it slower, even though you might possible also save a few bytes of RAM that way.
However, these are micro-optimizations that you shouldn't concern yourself about. Enable optimization and leave them to the compiler. If you for some reason unknown must micro-optimize your code then you could use uint_fast8_t instead. It is a type which is at least 8 bits and the compiler will pick the fastest possible type.
It is generally a sound idea to use 32 bit integers as much as possible on a 32 bit CPU, to avoid the numerous subtle bugs caused by the various complicated implicit type promotion rules in the C language. In embedded systems in particular, integer promotion and type balancing are notorious for causing many subtle bugs. (A MISRA-C checker can help protecting against that.)
Is there any way we can easily map 8 bit data to one of the 8 bit portions of 32 bit data? In the above code, shifting is done by hard coding (<<15 or <<19 etc). Can we avoid this hard coding and use some #defines to map the bits?
Generally you should avoid "magic numbers" and such. Not for performance reasons, but for readability.
The easiest way to do this is to use the pre-made register map for the processor, if you got one with the compiler. If not, you'll have to #define the register manually:
#define REGISTER (*(volatile uint32_t*)0x12345678)
#define REGISTER_SOMETHING 0x00FF0000 // some part of the register
Then either define all the possible values such as
#define REGISTER_SOMETHING_X 0x00010000
#define REGISTER_SOMETHING_Y 0x00020000
...
REGISTER = REGISTER_SOMETHING & REGISTER_SOMETHING_X;
// or just:
REGISTER |= REGISTER_SOMETHING_X;
REGISTER = REGISTER_SOMETHING_X | REGISTER_SOMETHING_Y;
// and so on
Alternatively, if part of the register is variable:
#define REGISTER_SOMETHING_VAL(val) \
( REGISTER_SOMETHING & ((uint32_t)val << 16) )
...
REGISTER = REGISTER_SOMETHING_VAL(5);
There are many ways you could write such macros and the code using them. Focus on turning the calling code readable and without "magic numbers". For more complex stuff, consider using inline functions instead of function-like macros.
Also for embedded systems, consider whether it makes any difference if all register parts are written with one single access or not. In some cases, you might get critical bugs if you don't, depending on the nature of the specific register. You need to be particularly careful when clearing interrupt masks etc. It is good practice to always disassemble such code and see what machine code you ended up with.
General advise:
Always consider endianess, alignment and portability. You might not think that your code will never get ported, but portability might mean re-using your own code in other projects.
If you use structs/unions for any form of hardware or data transmission protocol mapping, you must use static_assert to ensure that there is no padding or other alignment tricks. Do not use struct bit-fields under any circumstances! They are bad for numerous reasons and cannot be used reliably in any form of program, least of all in an embedded microcontroller application.
Three questions, many many programming styles.
This code is defιnιtely bad code. No atomic access... Do your self a favor and don't use it as a reference.
The variable "data" is already 32 bit wide. Is it efficient ...
There is no other impact. The programmer just used an extra 4byte local variable inside the function.
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int?
In general you can save memory only in RAM. Most of the linked scripts align data in 4 or 8 bytes. Of course you can use structs to bypass this both for RAM and Flash. For ex consider:
// ...
struct lala {
unsigned int a :12;
unsigned int b :20;
long c;
unsigned char d;
};
const struct lala l1; // l1 is const so it lives in Flash.
// Also l1.d is 8byte long ;)
// ...
This last one is bringing us to question 3.
Is there any way we can easily map 8 bit data to one of the 8 bit portions of 32 bit data? ...
The NXP's LPC2000 is a little-endian CPU see here for details. Thats mean that you can create structures in a way that the members will fit the memory location you want to access. To accomplish that you have to place Low memory address first. For ex:
// file.h
// ...
#include <stdint.h>
typedef volatile union {
struct {
uint8_t p0 :1;
uint8_t p1 :1;
uint8_t p2 :1;
uint8_t p3 :1;
...
uint8_t p30 :1;
uint8_t p31 :1;
}pin;
uint32_t port;
}port_io0clr_t;
// You have to check it make sure
// Now we can "put" it in memory.
#define REG_IO0CLR ((port_io0clr_t *) 0xE002800C)
//!< This is the memory address of IO0CLR in address space of LPC21xx
Now we can use the REG_IO0CLR pointer. For ex:
// file.c
// ...
int main (void) {
// ...
REG_IO0CLR->port = 0x0080; // Clear pin P0.7
// or even better
REG_IO0CLR->pin.p4 = 1; // Clear pin p0.4
// ...
return 0;
}
On the question 'why do we need to use bit-fields?', searching on Google I found that bit fields are used for flags.
Now I am curious,
Is it the only way bit-fields are used practically?
Do we need to use bit fields to save space?
A way of defining bit field from the book:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
Why do we use int?
How much space is occupied?
I am confused why we are using int, but not short or something smaller than an int.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
A quite good resource is Bit Fields in C.
The basic reason is to reduce the used size. For example, if you write:
struct {
unsigned int is_keyword;
unsigned int is_extern;
unsigned int is_static;
} flags;
You will use at least 3 * sizeof(unsigned int) or 12 bytes to represent three small flags, that should only need three bits.
So if you write:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
This uses up the same space as one unsigned int, so 4 bytes. You can throw 32 one-bit fields into the struct before it needs more space.
This is sort of equivalent to the classical home brew bit field:
#define IS_KEYWORD 0x01
#define IS_EXTERN 0x02
#define IS_STATIC 0x04
unsigned int flags;
But the bit field syntax is cleaner. Compare:
if (flags.is_keyword)
against:
if (flags & IS_KEYWORD)
And it is obviously less error-prone.
Now I am curious, [are flags] the only way bitfields are used practically?
No, flags are not the only way bitfields are used. They can also be used to store values larger than one bit, although flags are more common. For instance:
typedef enum {
NORTH = 0,
EAST = 1,
SOUTH = 2,
WEST = 3
} directionValues;
struct {
unsigned int alice_dir : 2;
unsigned int bob_dir : 2;
} directions;
Do we need to use bitfields to save space?
Bitfields do save space. They also allow an easier way to set values that aren't byte-aligned. Rather than bit-shifting and using bitwise operations, we can use the same syntax as setting fields in a struct. This improves readability. With a bitfield, you could write
directions.alice_dir = WEST;
directions.bob_dir = SOUTH;
However, to store multiple independent values in the space of one int (or other type) without bitfields, you would need to write something like:
#define ALICE_OFFSET 0
#define BOB_OFFSET 2
directions &= ~(3<<ALICE_OFFSET); // clear Alice's bits
directions |= WEST<<ALICE_OFFSET; // set Alice's bits to WEST
directions &= ~(3<<BOB_OFFSET); // clear Bob's bits
directions |= SOUTH<<BOB_OFFSET; // set Bob's bits to SOUTH
The improved readability of bitfields is arguably more important than saving a few bytes here and there.
Why do we use int? How much space is occupied?
The space of an entire int is occupied. We use int because in many cases, it doesn't really matter. If, for a single value, you use 4 bytes instead of 1 or 2, your user probably won't notice. For some platforms, size does matter more, and you can use other data types which take up less space (char, short, uint8_t, etc.).
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No, that is not correct. The entire unsigned int will exist, even if you're only using 8 of its bits.
Another place where bitfields are common are hardware registers. If you have a 32 bit register where each bit has a certain meaning, you can elegantly describe it with a bitfield.
Such a bitfield is inherently platform-specific. Portability does not matter in this case.
We use bit fields mostly (though not exclusively) for flag structures - bytes or words (or possibly larger things) in which we try to pack tiny (often 2-state) pieces of (often related) information.
In these scenarios, bit fields are used because they correctly model the problem we're solving: what we're dealing with is not really an 8-bit (or 16-bit or 24-bit or 32-bit) number, but rather a collection of 8 (or 16 or 24 or 32) related, but distinct pieces of information.
The problems we solve using bit fields are problems where "packing" the information tightly has measurable benefits and/or "unpacking" the information doesn't have a penalty. For example, if you're exposing 1 byte through 8 pins and the bits from each pin go through their own bus that's already printed on the board so that it leads exactly where it's supposed to, then a bit field is ideal. The benefit in "packing" the data is that it can be sent in one go (which is useful if the frequency of the bus is limited and our operation relies on frequency of its execution), and the penalty of "unpacking" the data is non-existent (or existent but worth it).
On the other hand, we don't use bit fields for booleans in other cases like normal program flow control, because of the way computer architectures usually work. Most common CPUs don't like fetching one bit from memory - they like to fetch bytes or integers. They also don't like to process bits - their instructions often operate on larger things like integers, words, memory addresses, etc.
So, when you try to operate on bits, it's up to you or the compiler (depending on what language you're writing in) to write out additional operations that perform bit masking and strip the structure of everything but the information you actually want to operate on. If there are no benefits in "packing" the information (and in most cases, there aren't), then using bit fields for booleans would only introduce overhead and noise in your code.
To answer the original question »When to use bit-fields in C?« … according to the book "Write Portable Code" by Brian Hook (ISBN 1-59327-056-9, I read the German edition ISBN 3-937514-19-8) and to personal experience:
Never use the bitfield idiom of the C language, but do it by yourself.
A lot of implementation details are compiler-specific, especially in combination with unions and things are not guaranteed over different compilers and different endianness. If there's only a tiny chance your code has to be portable and will be compiled for different architectures and/or with different compilers, don't use it.
We had this case when porting code from a little-endian microcontroller with some proprietary compiler to another big-endian microcontroller with GCC, and it was not fun. :-/
This is how I have used flags (host byte order ;-) ) since then:
# define SOME_FLAG (1 << 0)
# define SOME_OTHER_FLAG (1 << 1)
# define AND_ANOTHER_FLAG (1 << 2)
/* test flag */
if ( someint & SOME_FLAG ) {
/* do this */
}
/* set flag */
someint |= SOME_FLAG;
/* clear flag */
someint &= ~SOME_FLAG;
No need for a union with the int type and some bitfield struct then. If you read lots of embedded code those test, set, and clear patterns will become common, and you spot them easily in your code.
Why do we need to use bit-fields?
When you want to store some data which can be stored in less than one byte, those kind of data can be coupled in a structure using bit fields.
In the embedded word, when one 32 bit world of any register has different meaning for different word then you can also use bit fields to make them more readable.
I found that bit fields are used for flags. Now I am curious, is it the only way bit-fields are used practically?
No, this not the only way. You can use it in other ways too.
Do we need to use bit fields to save space?
Yes.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No. Memory only can be occupied in multiple of bytes.
Bit fields can be used for saving memory space (but using bit fields for this purpose is rare). It is used where there is a memory constraint, e.g., while programming in embedded systems.
But this should be used only if extremely required because we cannot have the address of a bit field, so address operator & cannot be used with them.
A good usage would be to implement a chunk to translate to—and from—Base64 or any unaligned data structure.
struct {
unsigned int e1:6;
unsigned int e2:6;
unsigned int e3:6;
unsigned int e4:6;
} base64enc; // I don't know if declaring a 4-byte array will have the same effect.
struct {
unsigned char d1;
unsigned char d2;
unsigned char d3;
} base64dec;
union base64chunk {
struct base64enc enc;
struct base64dec dec;
};
base64chunk b64c;
// You can assign three characters to b64c.enc, and get four 0-63 codes from b64dec instantly.
This example is a bit naive, since Base64 must also consider null-termination (i.e. a string which has not a length l so that l % 3 is 0). But works as a sample of accessing unaligned data structures.
Another example: Using this feature to break a TCP packet header into its components (or other network protocol packet header you want to discuss), although it is a more advanced and less end-user example. In general: this is useful regarding PC internals, SO, drivers, an encoding systems.
Another example: analyzing a float number.
struct _FP32 {
unsigned int sign:1;
unsigned int exponent:8;
unsigned int mantissa:23;
}
union FP32_t {
_FP32 parts;
float number;
}
(Disclaimer: Don't know the file name / type name where this is applied, but in C this is declared in a header; Don't know how can this be done for 64-bit floating-point numbers since the mantissa must have 52 bits and—in a 32 bit target—ints have 32 bits).
Conclusion: As the concept and these examples show, this is a rarely used feature because it's mostly for internal purposes, and not for day-by-day software.
To answer the parts of the question no one else answered:
Ints, not Shorts
The reason to use ints rather than shorts, etc. is that in most cases no space will be saved by doing so.
Modern computers have a 32 or 64 bit architecture and that 32 or 64 bits will be needed even if you use a smaller storage type such as a short.
The smaller types are only useful for saving memory if you can pack them together (for example a short array may use less memory than an int array as the shorts can be packed together tighter in the array). For most cases, when using bitfields, this is not the case.
Other uses
Bitfields are most commonly used for flags, but there are other things they are used for. For example, one way to represent a chess board used in a lot of chess algorithms is to use a 64 bit integer to represent the board (8*8 pixels) and set flags in that integer to give the position of all the white pawns. Another integer shows all the black pawns, etc.
You can use them to expand the number of unsigned types that wrap. Ordinary you would have only powers of 8,16,32,64... , but you can have every power with bit-fields.
struct a
{
unsigned int b : 3 ;
} ;
struct a w = { 0 } ;
while( 1 )
{
printf("%u\n" , w.b++ ) ;
getchar() ;
}
To utilize the memory space, we can use bit fields.
As far as I know, in real-world programming, if we require, we can use Booleans instead of declaring it as integers and then making bit field.
If they are also values we use often, not only do we save space, we can also gain performance since we do not need to pollute the caches.
However, caching is also the danger in using bit fields since concurrent reads and writes to different bits will cause a data race and updates to completely separate bits might overwrite new values with old values...
Bitfields are much more compact and that is an advantage.
But don't forget packed structures are slower than normal structures. They are also more difficult to construct since the programmer must define the number of bits to use for each field. This is a disadvantage.
Why do we use int? How much space is occupied?
One answer to this question that I haven't seen mentioned in any of the other answers, is that the C standard guarantees support for int. Specifically:
A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation defined type.
It is common for compilers to allow additional bit-field types, but not required. If you're really concerned about portability, int is the best choice.
Nowadays, microcontrollers (MCUs) have peripherals, such as I/O ports, ADCs, DACs, onboard the chip along with the processor.
Before MCUs became available with the needed peripherals, we would access some of our hardware by connecting to the buffered address and data buses of the microprocessor. A pointer would be set to the memory address of the device and if the device saw its address along with the R/W signal and maybe a chip select, it would be accessed.
Oftentimes we would want to access individual or small groups of bits on the device.
In our project, we used this to extract a page table entry and page directory entry from a given memory address:
union VADDRESS {
struct {
ULONG64 BlockOffset : 16;
ULONG64 PteIndex : 14;
ULONG64 PdeIndex : 14;
ULONG64 ReservedMBZ : (64 - (16 + 14 + 14));
};
ULONG64 AsULONG64;
};
Now suppose, we have an address:
union VADDRESS tempAddress;
tempAddress.AsULONG64 = 0x1234567887654321;
Now we can access PTE and PDE from this address:
cout << tempAddress.PteIndex;
I am trying to read a sequence of bits received from network (in a pre-defined format) and was wondering do we have to take care of endianees.
For example, the pre-defined format says that starting from most significant bit the data received would look like this
|R|| 11 bits data||20 bits data||16 bits data| where R is reserved and ignored.
My questions is while extracting do I have to take care of endianess or can I just do
u16 first_11_bits = *(u16 *)data & 0x7FF0) >>4
u32 20_bits_data = *(u32 *)data & 0x000FFFFF)
What kind of network? IP is defined in terms of bytes so whatever order the bit stream happens to be in the underlying layers has been abstracted away from you and you receive the bits in the order that your CPU understands them. Which means that the abstraction that C provides you to access those bits is portable. Think in terms of shifting left or right in C. Whatever the endianness is in the CPU the direction and semantics of shifting in C doesn't change.
So the question is: how is the data encoded into a byte stream by the other end? However the other end encodes the data should be the way you decode it. If they just shove bits into one byte and send that byte over the network, then you don't need to care. If they put bits into one int16 and then send it in network byte order, then you need to worry endianness of that int16. If they put the bits into an int32 and send that, then you need to worry about endianness of that int32.
Yes, you will always need to worry about endianness when reading/writing to external resource (file, network, ...). But it has nothing to do with the bit operations.
Casting directly (u16 *)data is not portable way to do things.
I recommend having functions to convert data to native types, before doing bit operations.
u16 first_11_bits = *(u16 *)data & 0x7FF0) >>4
u32 20_bits_data = *(u32 *)data & 0x000FFFFF)
This is UB. Either data points to a u16 or it points to a u32. It can't point to both. This isn't an endianess issue, it's just an invalid access. You must access for read through the same pointer type you accessed for write. Whichever way you wrote it, that's how you read it. Then you can do bit operations on the value you read back, which will be the same as the value you wrote.
One way this can go wrong is that the compiler is free to assume that a write through a u32 * won't affect a value read through a u16 *. So the write may be to memory but the read may be from a value cached in a register. This has broken real-world code.
For example:
u16 i = * (u16 *) data;
* (u32 *) data = 0;
u16 j = * (u16 *) data;
The compiler is free to treat the last line as u16 j = i;. It last read i the very same way, and it is free to assume that a write to a u32 * can't affect the result of a read from a u16 *.