When to use bit-fields in C - c

On the question 'why do we need to use bit-fields?', searching on Google I found that bit fields are used for flags.
Now I am curious,
Is it the only way bit-fields are used practically?
Do we need to use bit fields to save space?
A way of defining bit field from the book:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
Why do we use int?
How much space is occupied?
I am confused why we are using int, but not short or something smaller than an int.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?

A quite good resource is Bit Fields in C.
The basic reason is to reduce the used size. For example, if you write:
struct {
unsigned int is_keyword;
unsigned int is_extern;
unsigned int is_static;
} flags;
You will use at least 3 * sizeof(unsigned int) or 12 bytes to represent three small flags, that should only need three bits.
So if you write:
struct {
unsigned int is_keyword : 1;
unsigned int is_extern : 1;
unsigned int is_static : 1;
} flags;
This uses up the same space as one unsigned int, so 4 bytes. You can throw 32 one-bit fields into the struct before it needs more space.
This is sort of equivalent to the classical home brew bit field:
#define IS_KEYWORD 0x01
#define IS_EXTERN 0x02
#define IS_STATIC 0x04
unsigned int flags;
But the bit field syntax is cleaner. Compare:
if (flags.is_keyword)
against:
if (flags & IS_KEYWORD)
And it is obviously less error-prone.

Now I am curious, [are flags] the only way bitfields are used practically?
No, flags are not the only way bitfields are used. They can also be used to store values larger than one bit, although flags are more common. For instance:
typedef enum {
NORTH = 0,
EAST = 1,
SOUTH = 2,
WEST = 3
} directionValues;
struct {
unsigned int alice_dir : 2;
unsigned int bob_dir : 2;
} directions;
Do we need to use bitfields to save space?
Bitfields do save space. They also allow an easier way to set values that aren't byte-aligned. Rather than bit-shifting and using bitwise operations, we can use the same syntax as setting fields in a struct. This improves readability. With a bitfield, you could write
directions.alice_dir = WEST;
directions.bob_dir = SOUTH;
However, to store multiple independent values in the space of one int (or other type) without bitfields, you would need to write something like:
#define ALICE_OFFSET 0
#define BOB_OFFSET 2
directions &= ~(3<<ALICE_OFFSET); // clear Alice's bits
directions |= WEST<<ALICE_OFFSET; // set Alice's bits to WEST
directions &= ~(3<<BOB_OFFSET); // clear Bob's bits
directions |= SOUTH<<BOB_OFFSET; // set Bob's bits to SOUTH
The improved readability of bitfields is arguably more important than saving a few bytes here and there.
Why do we use int? How much space is occupied?
The space of an entire int is occupied. We use int because in many cases, it doesn't really matter. If, for a single value, you use 4 bytes instead of 1 or 2, your user probably won't notice. For some platforms, size does matter more, and you can use other data types which take up less space (char, short, uint8_t, etc.).
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No, that is not correct. The entire unsigned int will exist, even if you're only using 8 of its bits.

Another place where bitfields are common are hardware registers. If you have a 32 bit register where each bit has a certain meaning, you can elegantly describe it with a bitfield.
Such a bitfield is inherently platform-specific. Portability does not matter in this case.

We use bit fields mostly (though not exclusively) for flag structures - bytes or words (or possibly larger things) in which we try to pack tiny (often 2-state) pieces of (often related) information.
In these scenarios, bit fields are used because they correctly model the problem we're solving: what we're dealing with is not really an 8-bit (or 16-bit or 24-bit or 32-bit) number, but rather a collection of 8 (or 16 or 24 or 32) related, but distinct pieces of information.
The problems we solve using bit fields are problems where "packing" the information tightly has measurable benefits and/or "unpacking" the information doesn't have a penalty. For example, if you're exposing 1 byte through 8 pins and the bits from each pin go through their own bus that's already printed on the board so that it leads exactly where it's supposed to, then a bit field is ideal. The benefit in "packing" the data is that it can be sent in one go (which is useful if the frequency of the bus is limited and our operation relies on frequency of its execution), and the penalty of "unpacking" the data is non-existent (or existent but worth it).
On the other hand, we don't use bit fields for booleans in other cases like normal program flow control, because of the way computer architectures usually work. Most common CPUs don't like fetching one bit from memory - they like to fetch bytes or integers. They also don't like to process bits - their instructions often operate on larger things like integers, words, memory addresses, etc.
So, when you try to operate on bits, it's up to you or the compiler (depending on what language you're writing in) to write out additional operations that perform bit masking and strip the structure of everything but the information you actually want to operate on. If there are no benefits in "packing" the information (and in most cases, there aren't), then using bit fields for booleans would only introduce overhead and noise in your code.

To answer the original question »When to use bit-fields in C?« … according to the book "Write Portable Code" by Brian Hook (ISBN 1-59327-056-9, I read the German edition ISBN 3-937514-19-8) and to personal experience:
Never use the bitfield idiom of the C language, but do it by yourself.
A lot of implementation details are compiler-specific, especially in combination with unions and things are not guaranteed over different compilers and different endianness. If there's only a tiny chance your code has to be portable and will be compiled for different architectures and/or with different compilers, don't use it.
We had this case when porting code from a little-endian microcontroller with some proprietary compiler to another big-endian microcontroller with GCC, and it was not fun. :-/
This is how I have used flags (host byte order ;-) ) since then:
# define SOME_FLAG (1 << 0)
# define SOME_OTHER_FLAG (1 << 1)
# define AND_ANOTHER_FLAG (1 << 2)
/* test flag */
if ( someint & SOME_FLAG ) {
/* do this */
}
/* set flag */
someint |= SOME_FLAG;
/* clear flag */
someint &= ~SOME_FLAG;
No need for a union with the int type and some bitfield struct then. If you read lots of embedded code those test, set, and clear patterns will become common, and you spot them easily in your code.

Why do we need to use bit-fields?
When you want to store some data which can be stored in less than one byte, those kind of data can be coupled in a structure using bit fields.
In the embedded word, when one 32 bit world of any register has different meaning for different word then you can also use bit fields to make them more readable.
I found that bit fields are used for flags. Now I am curious, is it the only way bit-fields are used practically?
No, this not the only way. You can use it in other ways too.
Do we need to use bit fields to save space?
Yes.
As I understand only 1 bit is occupied in memory, but not the whole unsigned int value. Is it correct?
No. Memory only can be occupied in multiple of bytes.

Bit fields can be used for saving memory space (but using bit fields for this purpose is rare). It is used where there is a memory constraint, e.g., while programming in embedded systems.
But this should be used only if extremely required because we cannot have the address of a bit field, so address operator & cannot be used with them.

A good usage would be to implement a chunk to translate to—and from—Base64 or any unaligned data structure.
struct {
unsigned int e1:6;
unsigned int e2:6;
unsigned int e3:6;
unsigned int e4:6;
} base64enc; // I don't know if declaring a 4-byte array will have the same effect.
struct {
unsigned char d1;
unsigned char d2;
unsigned char d3;
} base64dec;
union base64chunk {
struct base64enc enc;
struct base64dec dec;
};
base64chunk b64c;
// You can assign three characters to b64c.enc, and get four 0-63 codes from b64dec instantly.
This example is a bit naive, since Base64 must also consider null-termination (i.e. a string which has not a length l so that l % 3 is 0). But works as a sample of accessing unaligned data structures.
Another example: Using this feature to break a TCP packet header into its components (or other network protocol packet header you want to discuss), although it is a more advanced and less end-user example. In general: this is useful regarding PC internals, SO, drivers, an encoding systems.
Another example: analyzing a float number.
struct _FP32 {
unsigned int sign:1;
unsigned int exponent:8;
unsigned int mantissa:23;
}
union FP32_t {
_FP32 parts;
float number;
}
(Disclaimer: Don't know the file name / type name where this is applied, but in C this is declared in a header; Don't know how can this be done for 64-bit floating-point numbers since the mantissa must have 52 bits and—in a 32 bit target—ints have 32 bits).
Conclusion: As the concept and these examples show, this is a rarely used feature because it's mostly for internal purposes, and not for day-by-day software.

To answer the parts of the question no one else answered:
Ints, not Shorts
The reason to use ints rather than shorts, etc. is that in most cases no space will be saved by doing so.
Modern computers have a 32 or 64 bit architecture and that 32 or 64 bits will be needed even if you use a smaller storage type such as a short.
The smaller types are only useful for saving memory if you can pack them together (for example a short array may use less memory than an int array as the shorts can be packed together tighter in the array). For most cases, when using bitfields, this is not the case.
Other uses
Bitfields are most commonly used for flags, but there are other things they are used for. For example, one way to represent a chess board used in a lot of chess algorithms is to use a 64 bit integer to represent the board (8*8 pixels) and set flags in that integer to give the position of all the white pawns. Another integer shows all the black pawns, etc.

You can use them to expand the number of unsigned types that wrap. Ordinary you would have only powers of 8,16,32,64... , but you can have every power with bit-fields.
struct a
{
unsigned int b : 3 ;
} ;
struct a w = { 0 } ;
while( 1 )
{
printf("%u\n" , w.b++ ) ;
getchar() ;
}

To utilize the memory space, we can use bit fields.
As far as I know, in real-world programming, if we require, we can use Booleans instead of declaring it as integers and then making bit field.

If they are also values we use often, not only do we save space, we can also gain performance since we do not need to pollute the caches.
However, caching is also the danger in using bit fields since concurrent reads and writes to different bits will cause a data race and updates to completely separate bits might overwrite new values with old values...

Bitfields are much more compact and that is an advantage.
But don't forget packed structures are slower than normal structures. They are also more difficult to construct since the programmer must define the number of bits to use for each field. This is a disadvantage.

Why do we use int? How much space is occupied?
One answer to this question that I haven't seen mentioned in any of the other answers, is that the C standard guarantees support for int. Specifically:
A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation defined type.
It is common for compilers to allow additional bit-field types, but not required. If you're really concerned about portability, int is the best choice.

Nowadays, microcontrollers (MCUs) have peripherals, such as I/O ports, ADCs, DACs, onboard the chip along with the processor.
Before MCUs became available with the needed peripherals, we would access some of our hardware by connecting to the buffered address and data buses of the microprocessor. A pointer would be set to the memory address of the device and if the device saw its address along with the R/W signal and maybe a chip select, it would be accessed.
Oftentimes we would want to access individual or small groups of bits on the device.

In our project, we used this to extract a page table entry and page directory entry from a given memory address:
union VADDRESS {
struct {
ULONG64 BlockOffset : 16;
ULONG64 PteIndex : 14;
ULONG64 PdeIndex : 14;
ULONG64 ReservedMBZ : (64 - (16 + 14 + 14));
};
ULONG64 AsULONG64;
};
Now suppose, we have an address:
union VADDRESS tempAddress;
tempAddress.AsULONG64 = 0x1234567887654321;
Now we can access PTE and PDE from this address:
cout << tempAddress.PteIndex;

Related

Replacing Bitfields with Bitshifting in an Embedded Register Struct

I'm trying to get a little fancier with how I write my drivers for peripherals in embedded applications.
Naturally, reading and writing to predefined memory mapped areas is a common task, so I try to wrap as much stuff up in a struct as I can.
Sometimes, I want to write to the whole register, and sometimes I want to manipulate a subset of bits in this register. Lately, I've read some stuff that suggests making a union that contains a single uintX type that's big enough to hold the whole register (usually 8 or 16 bits), as well as a struct that has a collection of bitfields in it to represent the specific bits of that register.
After reading a few comments on some of these posts that have this strategy outlined for managing multiple control/status registers for a peripheral, I concluded that most people with experience in this level of embedded development dislike bitfields largely due to lack of portability and eideness issues between different compilers...Not to mention that debugging can be confounded by bitfields as well.
The alternative that most people seem to recommend is to use bit shifting to ensure that the driver will be portable between platforms, compilers and environments, but I have had a hard time seeing this in action.
My question is:
How do I take something like this:
typedef union data_port
{
uint16_t CCR1;
struct
{
data1 : 5;
data2 : 3;
data3 : 4;
data4 : 4;
}
}
And get rid of the bitfields and convert to a bit-shifting scheme in a sane way?
Part 3 of this guys post here describes what I'm talking about in general...Notice at the end, he puts all the registers (wrapped up as unions) in a struct and then suggests to do the following:
define a pointer to refer to the can base address and cast it as a pointer to the (CAN) register file like the following.
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
What the hell is this cute little move all about? CAN0 is a pointer to a pointer to a function of a...number that's #defined as CAN_BASE_ADDRESS? I don't know...He lost me on that one.
The C standard does not specify how much memory a sequence of bit-fields occupies or what order the bit-fields are in. In your example, some compilers might decide to use 32 bits for the bit-fields, even though you clearly expect it to cover 16 bits. So using bit-fields locks you down to a specific compiler and specific compilation flags.
Using types larger than unsigned char also has implementation-defined effects, but in practice it is a lot more portable. In the real world, there are just two choices for an uintNN_t: big-endian or little-endian, and usually for a given CPU everybody uses the same order because that's the order that the CPU uses natively. (Some architectures such as mips and arm support both endiannesses, but usually people stick to one endianness across a large range of CPU models.) If you're accessing a CPU's own registers, its endianness may be part of the CPU anyway. On the other hand, if you're accessing a peripheral, you need to take care.
The documentation of the device that you're accessing will tell you how big a memory unit to address at once (apparently 2 bytes in your example) and how the bits are arranged. For example, it might state that the register is a 16-bit register accessed with a 16-bit load/store instructions whatever the CPU's endianness is, that data1 encompasses the 5 low-order bits, data2 encompasses the next 3, data3 the next 4 and data4 the next 4. In this case, you would declare the register as a uint16_t.
typedef volatile uint16_t data_port_t;
data_port_t *port = GET_DATA_PORT_ADDRESS();
Memory addresses in devices almost always need to be declared volatile, because it matters that the compiler reads and writes to them at the right time.
To access the parts of the register, use bit-shift and bit-mask operators. For example:
#define DATA2_WIDTH 3
#define DATA2_OFFSET 5
#define DATA2_MAX (((uint16_t)1 << DATA2_WIDTH) - 1) // in binary: 0000000000000111
#define DATA2_MASK (DATA2_MAX << DATA2_OFFSET) // in binary: 0000000011100000
void set_data2(data_port_t *port, unsigned new_field_value)
{
assert(new_field_value <= DATA2_MAX);
uint16_t old_register_value = *port;
// First, mask out the data2 bits from the current register value.
uint16_t new_register_value = (old_register_value & ~DATA2_MASK);
// Then mask in the new value for data2.
new_register_value |= (new_field_value << DATA2_OFFSET);
*port = new_register_value;
}
Obviously you can make the code a lot shorter. I separated it out into individual tiny steps so that the logic should be easy to follow. I include a shorter version below. Any compiler worth its salt should compile to the same code except in non-optimizing mode. Note that above, I used an intermediate variable instead of doing two assignments to *port because doing two assignments to *port would change the behavior: it would cause the device to see the intermediate value (and another read, since |= is both a read and a write). Here's the shorter version, and a read function:
void set_data2(data_port_t *port, unsigned new_field_value)
{
assert(new_field_value <= DATA2_MAX);
*port = (*port & ~(((uint16_t)1 << DATA2_WIDTH) - 1) << DATA2_OFFSET))
| (new_field_value << DATA2_OFFSET);
}
unsigned get_data2(data_port *port)
{
return (*port >> DATA2_OFFSET) & DATA2_MASK;
}
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
There is no function here. A function declaration would have a return type followed by an argument list in parentheses. This takes the value CAN_BASE_ADDRESS, which is presumably a pointer of some type, then casts the pointer to a pointer to CAN_REG_FILE, and finally dereferences the pointer. In other words, it accesses the CAN register file at the address given by CAN_BASE_ADDRESS. For example, there may be declarations like
void *CAN_BASE_ADDRESS = (void*)0x12345678;
typedef struct {
const volatile uint32_t status;
volatile uint16_t foo;
volatile uint16_t bar;
} CAN_REG_FILE;
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
and then you can do things like
CAN0.foo = 42;
printf("CAN0 status: %d\n", (int)CAN0.status);
1.
The problem when getting rid of bitfields is that you can no more use simple assignment statements, but you must shift the value to write, create a mask, make an AND to wipe out the previous bits, and use an OR to write the new bits. Reading is similar reversed. For example, let's take an 8-bit register defined like this:
val2.val1
0000.0000
val1 is the lower 4 bits, and val2 is the upper 4. The whole register is named REG.
To read val1 into tmp, one should issue:
tmp = REG & 0x0F;
and to read val2:
tmp = (REG >> 4) & 0xF; // AND redundant in this particular case
or
tmp = (REG & 0xF0) >> 4;
But to write tmp to val2, for example, you need to do:
REG = (REG & 0x0F) | (tmp << 4);
Of course some macro can be used to facilitate this, but the problem, for me, is that reading and writing require two different macros.
I think that bitfield is the best way, and a serious compiler should have options to define endiannes and bit ordering of such bitfields. Anyway, this is the future, even if, for now, maybe not every compiler has full support.
2.
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
This macro defines CAN0 as a dereferenced pointer to the base address of the CAN register(s), no function declaration is involved. Suppose you have an 8-bit register at address 0x800. You could do:
#define REG_BASE 0x800 // address of the register
#define REG (*(uint8_t *) REG_BASE)
REG = 0; // becomes *REG_BASE = 0
tmp = REG; // tmp=*REG_BASE
Instead of uint_t you can use a struct type, and all the bits, and probably all the bytes or words, go magically to their correct place, with the right semantics. Using a good compiler of course - but who doesn't want to deploy a good compiler?
Some compilers have/had extensions to assign a given address to a variable; for example old turbo pascal had the ABSOLUTE keyword:
var CAN: byte absolute 0x800:0000; // seg:ofs...!
The semantic is the same as before, only more straightforward because no pointer is involved, but this is managed by the macro and the compiler automatically.

Is bit masking comparable to "accessing an array" in bits?

For all the definitions I've seen of bit masking, they all just dive right into how to bit mask, use bitwise, etc. without explaining a use case for any of it. Is the purpose of updating all the bits you want to keep and all the bits you want to clear to "access an array" in bits?
Is the purpose of updating all the bits you want to keep and all the bits you want to clear to "access an array" in bits?
I will say the answer is no.
When you access an array of int you'll do:
int_array[index] = 42; // Write access
int x = int_array[42]; // Read access
If you want to write similar functions to read/write a specific bit in e.g. an unsigned int in a "array like fashion" it could look like:
unsigned a = 0;
set_bit(a, 4); // Set bit number 4
unsigned x = get_bit(a, 4); // Get bit number 4
The implementation of set_bit and get_bit will require (among other things) some bitwise mask operation.
So yes - to access bits in an "array like fashion" you'll need masking but...
There are many other uses of bit level masking.
Example:
int buffer[64];
unsigned index = 0;
void add_to_cyclic_buffer(int n)
{
buffer[index] = n;
++index;
index &= 0x3f; // Masking by 0x3f ensures index is always in the range 0..63
}
Example:
unsigned a = some_func();
a |= 1; // Make sure a is odd
a &= ~1; // Make sure a is even
Example:
unsigned a = some_func();
a &= ~0xf; // Make sure a is a multiple of 16
This is just a few examples of using "masking" that has nothing to do with accessing bits as an array. Many other examples can be made.
So to conclude:
Masking can be used to write functions that access bits in an array like fashion but masking is used for many other things as well.
So there are 3 (or 4) main uses.
One, as you say, is where you use the word as a set of true/false flags, where each flag is just indexed in a symmetric manner. I use 'word' here to be the piece of discrete memory that you are accessing in a single operation. So a byte holds 8 bit values, and a 'long long' holds 64 bits. With a bit more effort an array of words can be used as an array of more packed flags.
A second is where you are doing some manipulation of the value, but still consider the word to hold one value. There are many tricks like setting or clearing bottom bits to ensure alignment, or clearing top bits to get a modulus, shifting to divide or multiply by powers of 2.
A third use is where you want to pack lots of smaller-ranged values into a word. Each of the values is a particular meaning in context. This may either be because you need to communicate with a device that has defined this as the protocol, or because you need to create so many objects that the saving in space in each object outweighs the increase in code size and code speed cost (though that might be contrasted with the increased cache misses causing slowdown if the object were bigger).
As a distinction the fourth case is where these fields are distinct 1-bit flags that have specific meanings in the context of the code. Data objects tend to collect a number of such flags, and it is simply more convenient sometimes to store them as bits in a single location, than to use separate bytes for each flag. Generally testing a particular fixed indexed bit, or a fixed masked bit is no more expensive in code size or speed than testing the whole byte, though writing can be more complex. The storage savings are clear, so often programmers will declare an enumeration of bit masks by default when faced with creating a number of flags in a structure, or when writing a function.

Working with 32 bit data types and 8 bit data type in ARM

I am new to ARM LPC2148 micro controller and also new to StackOverflow. I just saw one piece of code in one of the evaluation boards. I am pasting as it is below.
Port pins P0.19 to P0.22 are mapped to D4 to D7 of LCD. The function below is used to send commands to LCD operated in 4 bit mode:
void LCD_Command(unsigned int data) // This function is used to send LCD commands
{
unsigned int temp=0;
EN_LOW(); // Set EN pin of LCD to to Low
COMMAND_PORT();
WRITE_DATA();
temp=data;
IO0PIN&=0xFF87FFFF;
IO0PIN|=(temp & 0xF0) << 15;
EN_HI(); // Give strobe by enabling and disabling En pin of LCD
EN_LOW();
temp=data & 0x0F;
IO0PIN&=0xFF87FFFF;
IO0PIN|=(temp) << 19;
EN_HI();
EN_LOW();
while(Busy_Wait());
Delay(10);
}
My questions are:
The variable "data" is already 32 bit wide. Is it efficient to shift the data in this way? Coder could have passed 32 bit data and then masked (&)/ORed (|). Or are there any other impacts?
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int? Since registers are 32 bit wide, I am not sure whether internally any segmentation is done to save memory.
Is there any way we can easily map 8 bit data to one of the 8 bit portions of 32 bit data? In the above code, shifting is done by hard coding (<<15 or <<19 etc). Can we avoid this hard coding and use some #defines to map the bits?
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int?
Only when storing them into RAM, which this small function will not do once the optimizer is on. Note that using char types may introduce additional code to be generated to handle overflows correctly.
[...] Can we avoid this hard coding and use some #defines to map the bits?
Easy:
#define LCD_SHIFT_BITS 19
void LCD_Command(unsigned int data) // This function is used to send LCD commands
{
unsigned int temp=0;
EN_LOW(); // Set EN pin of LCD to to Low
COMMAND_PORT();
WRITE_DATA();
temp=data;
IO0CLR = 0x0F << LCD_SHIFT_BITS;
IO0SET = (temp & 0xF0) << (LCD_SHIFT_BITS - 4);
EN_HI(); // Give strobe by enabling and disabling En pin of LCD
EN_LOW();
temp=data & 0x0F;
IO0CLR = 0x0F << LCD_SHIFT_BITS;
IO0SET = temp << LCD_SHIFT_BITS;
EN_HI();
EN_LOW();
while(Busy_Wait());
Delay(10);
}
I also changed pin set and clear to be atomic.
The variable "data" is already 32 bit wide. Is it efficient to shift the data in this way? Coder could have passed 32 bit data and then masked (&)/ORed (|). Or are there any other impacts?
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int? Since registers are 32 bit wide, I am not sure whether internally any segmentation is done to save memory.
Since you are using a 32 bit MCU, reducing the variable sizes will not make the code any faster. It could possibly make it slower, even though you might possible also save a few bytes of RAM that way.
However, these are micro-optimizations that you shouldn't concern yourself about. Enable optimization and leave them to the compiler. If you for some reason unknown must micro-optimize your code then you could use uint_fast8_t instead. It is a type which is at least 8 bits and the compiler will pick the fastest possible type.
It is generally a sound idea to use 32 bit integers as much as possible on a 32 bit CPU, to avoid the numerous subtle bugs caused by the various complicated implicit type promotion rules in the C language. In embedded systems in particular, integer promotion and type balancing are notorious for causing many subtle bugs. (A MISRA-C checker can help protecting against that.)
Is there any way we can easily map 8 bit data to one of the 8 bit portions of 32 bit data? In the above code, shifting is done by hard coding (<<15 or <<19 etc). Can we avoid this hard coding and use some #defines to map the bits?
Generally you should avoid "magic numbers" and such. Not for performance reasons, but for readability.
The easiest way to do this is to use the pre-made register map for the processor, if you got one with the compiler. If not, you'll have to #define the register manually:
#define REGISTER (*(volatile uint32_t*)0x12345678)
#define REGISTER_SOMETHING 0x00FF0000 // some part of the register
Then either define all the possible values such as
#define REGISTER_SOMETHING_X 0x00010000
#define REGISTER_SOMETHING_Y 0x00020000
...
REGISTER = REGISTER_SOMETHING & REGISTER_SOMETHING_X;
// or just:
REGISTER |= REGISTER_SOMETHING_X;
REGISTER = REGISTER_SOMETHING_X | REGISTER_SOMETHING_Y;
// and so on
Alternatively, if part of the register is variable:
#define REGISTER_SOMETHING_VAL(val) \
( REGISTER_SOMETHING & ((uint32_t)val << 16) )
...
REGISTER = REGISTER_SOMETHING_VAL(5);
There are many ways you could write such macros and the code using them. Focus on turning the calling code readable and without "magic numbers". For more complex stuff, consider using inline functions instead of function-like macros.
Also for embedded systems, consider whether it makes any difference if all register parts are written with one single access or not. In some cases, you might get critical bugs if you don't, depending on the nature of the specific register. You need to be particularly careful when clearing interrupt masks etc. It is good practice to always disassemble such code and see what machine code you ended up with.
General advise:
Always consider endianess, alignment and portability. You might not think that your code will never get ported, but portability might mean re-using your own code in other projects.
If you use structs/unions for any form of hardware or data transmission protocol mapping, you must use static_assert to ensure that there is no padding or other alignment tricks. Do not use struct bit-fields under any circumstances! They are bad for numerous reasons and cannot be used reliably in any form of program, least of all in an embedded microcontroller application.
Three questions, many many programming styles.
This code is defιnιtely bad code. No atomic access... Do your self a favor and don't use it as a reference.
The variable "data" is already 32 bit wide. Is it efficient ...
There is no other impact. The programmer just used an extra 4byte local variable inside the function.
Do we save any memory in LPC21xx if we use unsigned char instead of unsigned int?
In general you can save memory only in RAM. Most of the linked scripts align data in 4 or 8 bytes. Of course you can use structs to bypass this both for RAM and Flash. For ex consider:
// ...
struct lala {
unsigned int a :12;
unsigned int b :20;
long c;
unsigned char d;
};
const struct lala l1; // l1 is const so it lives in Flash.
// Also l1.d is 8byte long ;)
// ...
This last one is bringing us to question 3.
Is there any way we can easily map 8 bit data to one of the 8 bit portions of 32 bit data? ...
The NXP's LPC2000 is a little-endian CPU see here for details. Thats mean that you can create structures in a way that the members will fit the memory location you want to access. To accomplish that you have to place Low memory address first. For ex:
// file.h
// ...
#include <stdint.h>
typedef volatile union {
struct {
uint8_t p0 :1;
uint8_t p1 :1;
uint8_t p2 :1;
uint8_t p3 :1;
...
uint8_t p30 :1;
uint8_t p31 :1;
}pin;
uint32_t port;
}port_io0clr_t;
// You have to check it make sure
// Now we can "put" it in memory.
#define REG_IO0CLR ((port_io0clr_t *) 0xE002800C)
//!< This is the memory address of IO0CLR in address space of LPC21xx
Now we can use the REG_IO0CLR pointer. For ex:
// file.c
// ...
int main (void) {
// ...
REG_IO0CLR->port = 0x0080; // Clear pin P0.7
// or even better
REG_IO0CLR->pin.p4 = 1; // Clear pin p0.4
// ...
return 0;
}

Bit field manipulation disadvantages

I was reading this link
http://dec.bournemouth.ac.uk/staff/awatson/micro/articles/9907feat2.htm
I could not understand this following statements from the link, Please help me understand about this.
The programmer just writes some macros that shift or mask the
appropriate bits to get what is desired. However, if the data involves
longer binary encoded records, the C API runs into a problem. I have,
over the years, seen many lengthy, complex binary records described
with the short or long integer bit-field definition facilities. C
limits these bit fields to subfields of integer-defined variables,
which implies two limitations: first of all, that bit fields may be no
wider, in bits, than the underlying variable; and secondly, that no
bit field should overlap the underlying variable boundaries. Complex
records are usually composed of several contiguous long integers
populated with bit-subfield definitions.
ANSI-compliant compilers are free to impose these size and alignment
restrictions and to specify, in an implementation-dependent but
predictable way, how bit fields are packed into the underlying machine
word structure. Structure memory alignment often isn’t portable, but
bit field memory is even less so.
What i have understood from these statements is that the macros can be used to mask the bits to left or right shift. But i had this doubt in my mind why do they use macros? - I thought by defining it in macros the portability can be established irrespective of 16-bit or 32-bit OS..Is it true?I could not understand the two disadvantages mentioned in the above statement.1.bit fields may be no wider 2.no bit field should overlap the underlying variable boundaries
and the line,
Complex records are usually composed of several contiguous long integers
populated with bit-subfield definitions.
1.bit fields may be no wider
Let's say you want a bitfield that is 200 bits long.
struct my_struct {
int my_field:200; /* Illegal! No integer type has 200 bits --> compile error!
} v;
2.no bit field should overlap the underlying variable boundaries
Let's say you want two 30 bit bitfields and that the compiler uses a 32 bit integer as the underlying variable.
struct my_struct {
unsigned int my_field1:30;
unsigned int my_field2:30; /* Without padding this field will overlap a 32-bit boundary */
} v;
Ususally, the compiler will add padding automatically, generating a struct with the following layout:
struct my_struct {
unsigned int my_field1:30;
:2 /* padding added by the compiler */
unsigned int my_field2:30; /* Without padding this field will overlap a 32-bit boundary */
:2 /* padding added by the compiler */
} v;

does 8-bit processor have to face endianness problem?

If I have a int32 type integer in the 8-bit processor's memory, say, 8051, how could I identify the endianess of that integer? Is it compiler specific? I think this is important when sending multybyte data through serial lines etc.
With an 8 bit microcontroller that has no native support for wider integers, the endianness of integers stored in memory is indeed up to the compiler writer.
The SDCC compiler, which is widely used on 8051, stores integers in little-endian format (the user guide for that compiler claims that it is more efficient on that architecture, due to the presence of an instruction for incrementing a data pointer but not one for decrementing).
If the processor has any operations that act on multi-byte values, or has an multi-byte registers, it has the possibility to have an endian-ness.
http://69.41.174.64/forum/printable.phtml?id=14233&thread=14207 suggests that the 8051 mixes different endian-ness in different places.
The endianness is specific to the CPU architecture. Since a compiler needs to target a particular CPU, the compiler would have knowledge of the endianness as well. So if you need to send data over a serial connection, network, etc you may wish to use build-in functions to put data in network byte order - especially if your code needs to support multiple architectures.
For more information, see: http://www.gnu.org/s/libc/manual/html_node/Byte-Order.html
It's not just up to the compiler - '51 has some native 16-bit registers (DPTR, PC in standard, ADC_IN, DAC_OUT and such in variants) of given endianness which the compiler has to obey - but outside of that, the compiler is free to use any endianness it prefers or one you choose in project configuration...
An integer does not have endianness in it. You can't determine just from looking at the bytes whether it's big or little endian. You just have to know: For example if your 8 bit processor is little endian and you're receiving a message that you know to be big endian (because, for example, the field bus system defines big endian), you have to convert values of more than 8 bits. You'll need to either hard-code that or to have some definition on the system on which bytes to swap.
Note that swapping bytes is the easy thing. You may also have to swap bits in bit fields, since the order of bits in bit fields is compiler-specific. Again, you basically have to know this at build time.
unsigned long int x = 1;
unsigned char *px = (unsigned char *) &x;
*px == 0 ? "big endian" : "little endian"
If x is assigned the value 1 then the value 1 will be in the least significant byte.
If we then cast x to be a pointer to bytes, the pointer will point to the lowest memory location of x. If that memory location is 0 it is big endian, otherwise it is little endian.
#include <stdio.h>
union foo {
int as_int;
char as_bytes[sizeof(int)];
};
int main() {
union foo data;
int i;
for (i = 0; i < sizeof(int); ++i) {
data.as_bytes[i] = 1 + i;
}
printf ("%0x\n", data.as_int);
return 0;
}
Interpreting the output is up to you.

Resources