I'm investigation how different compilers handle unaligned access of structure bitfields members as well as members that cross the primitive types' boundaries, and I think MinGW64 is bugged. My test program is:
#include <stdint.h>
#include <stdio.h>
/* Structure for testing element access
The crux is the ISO C99 6.7.2.1p10 item:
An implementation may allocate any addressable storage unit large enough to hold a bitfield.
If enough space remains, a bit-field that immediately follows another bit-field in a
structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
implementation-defined. The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.
*/
typedef struct _my_struct
{
/* word 0 */
uint32_t first :32; /**< A whole word element */
/* word 1 */
uint32_t second :8; /**< bits 7-0 */
uint32_t third :8; /**< bits 15-8 */
uint32_t fourth :8; /**< bits 23-16 */
uint32_t fifth :8; /**< bits 31-24 */
/* word 2 */
uint32_t sixth :16; /**< bits 15-0 */
uint32_t seventh :16; /**< bits 31-16 */
/* word 3 */
uint32_t eigth :24; /**< bits 23-0 */
uint32_t ninth :8; /**< bits 31-24 */
/* word 4 */
uint32_t tenth :8; /**< bits 7-0 */
uint32_t eleventh :24; /**< bits 31-8 */
/* word 5 */
uint32_t twelfth :8; /**< bits 7-0 */
uint32_t thirteeneth :16; /**< bits 23-8 */
uint32_t fourteenth :8; /**< bits 31-24 */
/* words 6 & 7 */
uint32_t fifteenth :16; /**< bits 15-0 */
uint32_t sixteenth :8; /**< bits 23-16 */
uint32_t seventeenth :16; /**< bits 31-24 & 7-0 */
/* word 7 */
uint32_t eighteenth :24; /**< bits 31-8 */
/* word 8 */
uint32_t nineteenth :32; /**< bits 31-0 */
/* words 9 & 10 */
uint32_t twentieth :16; /**< bits 15-0 */
uint32_t twenty_first :32; /**< bits 31-16 & 15-0 */
uint32_t twenty_second :16; /**< bits 31-16 */
/* word 11 */
uint32_t twenty_third :32; /**< bits 31-0 */
} __attribute__((packed)) my_struct;
uint32_t buf[] = {
0x11223344, 0x55667788, 0x99AABBCC, 0x01020304, /* words 0 - 3 */
0x05060708, 0x090A0B0C, 0x0D0E0F10, 0x12131415, /* words 4 - 7 */
0x16171819, 0x20212324, 0x25262728, 0x29303132, /* words 8 - 11 */
0x34353637, 0x35363738, 0x39404142, 0x43454647 /* words 12 - 15 */
};
uint32_t data[64];
int main(void)
{
my_struct *p;
p = (my_struct*) buf;
data[0] = 0;
data[1] = p->first;
data[2] = p->second;
data[3] = p->third;
data[4] = p->fourth;
data[5] = p->fifth;
data[6] = p->sixth;
data[7] = p->seventh;
data[8] = p->eigth;
data[9] = p->ninth;
data[10] = p->tenth;
data[11] = p->eleventh;
data[12] = p->twelfth;
data[13] = p->thirteeneth;
data[14] = p->fourteenth;
data[15] = p->fifteenth;
data[16] = p->sixteenth;
data[17] = p->seventeenth;
data[18] = p->eighteenth;
data[19] = p->nineteenth;
data[20] = p->twentieth;
data[21] = p->twenty_first;
data[22] = p->twenty_second;
data[23] = p->twenty_third;
if( p->fifth == 0x55 )
{
data[0] = 0xCAFECAFE;
}
else
{
data[0] = 0xDEADBEEF;
}
int i;
for (i = 0; i < 24; ++i) {
printf("data[%d] = 0x%0x\n", i, data[i]);
}
return data[0];
}
And the results I found are:
| Data Member | Type | GCC Cortex M3 | GCC mingw64 | GCC Linux | GCC Cygwin |
|:------------|:-------:|:---------------|:--------------|:--------------|:--------------|
| data[0] | uint32_t| 0x0 | 0xcafecafe | 0xcafecafe | 0xcafecafe |
| data[1] | uint32_t| 0x11223344 | 0x11223344 | 0x11223344 | 0x11223344 |
| data[2] | uint32_t| 0x88 | 0x88 | 0x88 | 0x88 |
| data[3] | uint32_t| 0x77 | 0x77 | 0x77 | 0x77 |
| data[4] | uint32_t| 0x66 | 0x66 | 0x66 | 0x66 |
| data[5] | uint32_t| 0x55 | 0x55 | 0x55 | 0x55 |
| data[6] | uint32_t| 0xbbcc | 0xbbcc | 0xbbcc | 0xbbcc |
| data[7] | uint32_t| 0x99aa | 0x99aa | 0x99aa | 0x99aa |
| data[8] | uint32_t| 0x20304 | 0x20304 | 0x20304 | 0x20304 |
| data[9] | uint32_t| 0x1 | 0x1 | 0x1 | 0x1 |
| data[10] | uint32_t| 0x8 | 0x8 | 0x8 | 0x8 |
| data[11] | uint32_t| 0x50607 | 0x50607 | 0x50607 | 0x50607 |
| data[12] | uint32_t| 0xc | 0xc | 0xc | 0xc |
| data[13] | uint32_t| 0xa0b | 0xa0b | 0xa0b | 0xa0b |
| data[14] | uint32_t| 0x9 | 0x9 | 0x9 | 0x9 |
| data[15] | uint32_t| 0xf10 | 0xf10 | 0xf10 | 0xf10 |
| data[16] | uint32_t| 0xe | 0xe | 0xe | 0xe |
| data[17] | uint32_t| 0x150d | 0x1415 | 0x150d | 0x150d |
| data[18] | uint32_t| 0x121314 | 0x171819 | 0x121314 | 0x121314 |
| data[19] | uint32_t| 0x16171819 | 0x20212324 | 0x16171819 | 0x16171819 |
| data[20] | uint32_t| 0x2324 | 0x2728 | 0x2324 | 0x2324 |
| data[21] | uint32_t| 0x27282021 | 0x29303132 | 0x27282021 | 0x27282021 |
| data[22] | uint32_t| 0x2526 | 0x3637 | 0x2526 | 0x2526 |
| data[23] | uint32_t| 0x29303132 | 0x35363738 | 0x29303132 | 0x29303132 |
GCC Cortex M3 is
arm-none-eabi-gcc (GNU MCU Eclipse ARM Embedded GCC, 32-bit) 8.2.1 20181213 (release) [gcc-8-branch revision 267074]
GCC Mingw is
gcc.exe (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 8.1.0
GCC Linux is
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
GCC Cygwin is
gcc (GCC) 7.4.0
All GCC versions seem to correctly handle unaligned access (like my_struct.thirteeneth).
The problem is not that members who cross the word boundary (my_struct.seventeenth) are different, as the C99 standard quoted above clearly states that the behaviour is implementation-defined. The problem is that all subsequent accesses are clearly incorrect (data[17] and on) even for aligned members (my_struct.nineteenth & my_struct.twenty_third). What's going on here, is this a bug or are these valid values?
It is not bugged, it lays the bitfields according to windows ABI.
According to gcc docs:
If packed is used on a structure, or if bit-fields are used, it may be that the Microsoft ABI lays out the structure differently than the way GCC normally does.
Compile mingw64 version with -mno-ms-bitfields to fix the difference. Or compile all other versions with -mms-bitfields to lay out the structure the same as mingw.
The chances that a widely used compiler like GCC has a bug is not zero but really minimal. And odds are that PEBKAS. ;-)
Anyway, I have compiled your programm with "gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0" and got the same result as you in the column "mingw64".
A finer look reveals that the compiler aligns the bitfields on 32-bit boundaries which happens to be the width of an int. This conforms perfectly to chapter 6.7.2.1 of the standard C17 which states that the "straddling" (in its words of the annex J.3.9) is implementation-defined.
The other GCC variants are not aligning the bit fields and support crossing 32-bit boundaries.
It is clearly not a bug, the values are valid. It might be worth to research the reasons and perhaps post a feature request.
Edit:
Just to clarify, this is the layout with alignment. There is nothing wrong with elements seventeenth and following:
/* 0x11223344: word 0 */
uint32_t first :32;
/* 0x55667788: word 1 */
uint32_t second :8;
uint32_t third :8;
uint32_t fourth :8;
uint32_t fifth :8;
/* 0x99AABBCC: word 2 */
uint32_t sixth :16;
uint32_t seventh :16;
/* 0x01020304: word 3 */
uint32_t eigth :24;
uint32_t ninth :8;
/* 0x05060708: word 4 */
uint32_t tenth :8;
uint32_t eleventh :24;
/* 0x090A0B0C: word 5 */
uint32_t twelfth :8;
uint32_t thirteeneth :16;
uint32_t fourteenth :8;
/* 0x0D0E0F10: words 6 */
uint32_t fifteenth :16;
uint32_t sixteenth :8;
/* 0x12131415: word 7, because "seventeenth" does not fit in the space left */
uint32_t seventeenth :16;
/* 0x16171819: word 8, because "eighteenth" does not fit in the space left */
uint32_t eighteenth :24;
/* 0x20212324: word 9, because "nineteenth" does not fit in the space left */
uint32_t nineteenth :32;
/* 0x25262728: words 10 */
uint32_t twentieth :16;
/* 0x29303132: word 11, because "twenty_first" does not fit in the space left */
uint32_t twenty_first :32;
/* 0x34353637: word 12 */
uint32_t twenty_second :16;
/* 0x35363738: word 13, because "twenty_third" does not fit in the space left */
uint32_t twenty_third :32;
You can not rely at all, in any way, on how bit-fields are arranged in a structure.
Per 6.7.2.1 Structure and union specifiers, paragraph 11 of the C11 standard (bolding mine):
An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
You even quoted that. Given that, there is no "incorrect" way for an implementation to lay out a bit-field.
So you can not rely on the size of the bit-field container.
You can not rely on whether or not a bit-field crosses units.
You can not rely on the order of bit-fields within a unit.
Yet your question assumes you can do all that, even using terms such as "correct" when you see what you expected and "clearly incorrect" to describe bit-field layouts you didn't expect.
It's not "clearly incorrect".
If you need to know where a bit is in a structure, you simply can not portably use bit-fields.
In fact, all your effort on this question is a perfect case study in why you can't rely on bit-fields.
Related
I would like to ask a question about how to write inline assembly code for Store-Conditional instruction in RISC-V. Below is some brief background (RISCV-ISA-Specification on page 40, section 7.2):
SC writes a word in rs2 to the address in rs1, provided a valid reservation still exists on that address. SC writes zero to rd on success or a nonzero code on failure.
The instruction that we will be focusing on is SC.D - store-conditional a 64-bit value. As shown on page 106 of RISCV-ISA-Specification, the instruction format is as follows:
00011 | aq<1> | rl<1> | rs2<5> | rs1<5> | 011 | rd<5> | 0101111
In order to use inline assembly to generate the corresponding code for SC.W instruction, we need 3 registers. The register list can be found here.
The register field of the instruction is 5 bit each. Hence, there are 32 general registers in RISC-V: x0, x1, ... x31. Each register has its own ABI(application binary interface), for instance, register x16 corresponds to a6 register, hence, the corresponding 5-bit value should be 10000.
I choose the following registers assignment:
rs2: a6 register (register x16, i.e. 0b10000)
rs1: a7 register (register x17, i.e. 0b10001)
rd: s4 register (register x20, i.e. 0b10100)
Hence, by filling in the corresponding register bits of the original instruction, we have the following:
00011 | aq<1> | rl<1> | 10000 | 10001 | 011 | 10100 | 0101111
For the two bits aq and rl, it is used for specifying the ordering constraints (page 40 of RISCV-ISA-Specification):
If both the aq and rl bits are set, the atomic
memory operation is sequentially consistent and cannot be observed to happen before any earlier
memory operations or after any later memory operations in the same RISC-V hart, and can only be
observed by any other hart in the same global order of all sequentially consistent atomic memory
operations to the same address domain.
So we just set both bits to 1 since we want SC.D to be executed atomically. Now we have the final instruction bits:
00011 | 1 | 1 | 10000 | 10001 | 011 | 10100 | 0101111
-> 00011111|00001000|10111010|00101111
0x1f 0x08 0xba 0x2f
Since RISC-V uses little endian, the corresponding inline assembly can be generated by:
__asm__ volatile(".byte 0x2f, 0xba, 0x08, 0x1f");
There are also some other preparations like loading values into rs1(a7) and rs2(a6) registers. Therefore, I have the following code (but it did not work as expected):
/**
* rs2: holds the value to be written. I pick a6 register.
* rs1: holds the address to be written to. I pick a7 register.
* rd: holds the return value of SC.D instruction. I pick s4 register.
*
* #src: the value to be written. rs2. a6 register
* #dst: the address to be written to. rs1. a7 register
* #rd: the value that holds the return value of SC.D
*/
static inline void sc(void *src, void *dst, uint64_t *rd) {
uint64_t *tmp_src = (uint64_t *)src;
uint64_t src_val = *tmp_src; // 13
uint64_t dst_addr = (uint64_t)dst;
uint64_t ret = 100;
// first of all, need to prepare the registers a6 and a7.
/* load value to be written into register a6 */
__asm__ volatile("ld a6, %0"::"m"(src_val));
/* load the address to be written to into register a7 */
__asm__ volatile("ld a7, %0"::"m"(dst_addr));
/* the actual SC.D: */
__asm__ volatile(".byte 0x2f, 0xba, 0x08, 0x1f");
// __asm__ volatile("sc.d s4, a6, (a7)"); // this does not work either.
/* obtain the value in register s4 */
__asm__ volatile("sd s4, %0":"=m"(ret));
*rd = ret;
return;
}
int main() {
uint64_t *src = malloc(sizeof(uint64_t));
uint64_t *dst = malloc(sizeof(uint64_t));
uint64_t rd = 20;
*src = 13;
*dst = 3;
sc(src, dst, &rd); // write value 13 into #dst, so #dst should be 13 afterwards
// the expected output should be "dst: 13, rd: 0"
// What I get: "dst: 3, rd: 1"
printf("dst: %ld, rd: %ld\n", *src, *dst, rd);
return 0;
}
The result does not seem to change the dst value. May I know which part I am doing wrong? Any hints would be appreciated.
The PMA memory for the USB peripheral on some STM32s is structured as 256 16-bit words that are 32-bit aligned. This format is demonstrated in the following table:
Address + offset
0x0
0x1
0x2
0x3
0x40006000
0x000
0x001
-----
-----
0x40006004
0x002
0x003
-----
-----
0x40006008
0x004
0x005
-----
-----
0x4000600C
0x006
0x007
-----
-----
0x40006010
0x008
0x009
-----
-----
....
....
....
-----
-----
0x400063F8
0x1FC
0x1FD
-----
-----
0x400063FC
0x1FE
0x1FF
-----
-----
For example, the PMA address 0x006 is accessed by the CPU at address 0x4000600C. Access to any address with an offset of 0x2 and 0x3 is invalid.
Ideally, data in the PMA could be accessed by defining structs such as the following, but count would occupy offsets 0x3 and 0x4, assuming the struct points to an address with an offset of 0x0.
struct buf_desc_entry {
volatile uint16_t addr;
volatile uint16_t count;
} __attribute__((packed));
One solution would be to pad the struct so that count occupies offsets 0x0 and 0x1. This would be tedious and insecure since it would have to be done manually for all structs.
struct buf_desc_entry {
volatile uint16_t addr;
volatile uint16_t _1;
volatile uint16_t count;
volatile uint16_t _2;
} __attribute__((packed));
Another solution for this example would be to 32-bit align all variables either using __attribute__((aligned (4))) or using a linker script, however, this solution would not work for structs that contain 8-bit variables, since they can occupy offsets of 0x0 or 0x1.
Is there a more elegant way of doing this?
I have some 32bits memory area filled from hardware with this type of data :
reg1 : 63 | 62 | ... | 32 | reg0 : 31| ... | 0
-------------------------- | ----------------
val | time value upper | time value lower
I'm trying to get the time value and the 'val' at once with struct and union.
I first tried :
typedef struct
{
uint64_t time : 63;
uint64_t value : 1;
} myStruct;
but this is not working, myStruct.time cannot be bigger than 32bits.
I tried several things, event somethign like this :
typedef union
{
union{
struct{
uint32_t lower : 32;
uint32_t upper : 31;
} spare;
uint64_t value;
} time;
struct{
uint32_t spare_low : 32;
uint32_t spare_upp : 31;
uint32_t value : 1;
} pin;
} myStruct;
But in this case, myStruct.time.value get obviously the bit 63. I get some values like 0x8000_0415_4142_3015 instead of 0x0000_0415_4142_3015.
How can I retrieve the 63 bits time value and the 1 bit pin value easily ?
PS : I know that I can make a macro of something like that, I'm looking for a direct method.
Bitfields should never be used when bit placement is important.
Use bit masking instead:
uint64_t someValue = ...
uint64_t time = someValue & 0x7fffffffffffffff;
bool pin = someValue >> 63 & 1;
Bit-fields only gives head ache and non-portable code. Always avoid them.
Assuming that 64 bit access is feasible for your specific system and "reg1" and "reg0" aren't actual variables but something placed in memory by hardware, then:
#define reg1 (*(volatile uint32_t*)0x1000) // address of reg1
#define reg0 (*(volatile uint32_t*)0x1004) // address of reg0
const uint32_t VAL_MASK = 1ul << 31;
...
uint64_t time_;
time_ = (uint64_t)(reg1 & (VAL_MASK - 1)) << 32;
time_ |= reg0;
bool val = reg1 & VAL_MASK;
// or alternatively:
uint64_t val = reg1 >> 31;
(Please don't name your variable time since that collides with the standard lib.)
If you don't have 64 bit types or in case they are simply too slow to use, then you have to access each 32 bits individually, just don't OR them together as done above.
int getval(uint64_t reg)
{
return !!(reg & ((uint64_t)1 << 63));
}
uint64_t gettime(uint64_t reg)
{
return reg & (~((uint64_t)1 << 63));
}
I have the thankless job of writing an IPv6 header parser.
I'm wondering if the version, traffic class and flow control labels could be parsed out using bitfields.
I wrote some test code. Executing on an x86 system I get unexpected results.
#include <stdint.h>
#include <stdio.h>
typedef struct __attribute__ ((__packed__)) {
uint32_t flow_label:20;
uint32_t traffic_class:8;
uint32_t ip_version:4;
} test_t;
int main(int argc, char **argv)
{
uint8_t data[] = { 0x60, 0x00, 0x00, 0x00 };
test_t *ipv6 = (void *)data;
printf("Size is %zu, version %u, traffic class %u, flow label %u\n", sizeof(test_t), ipv6->ip_version, ipv6->traffic_class, ipv6->flow_label);
}
I'd expect the first nibble to be available in ip_version, but it doesn't seem to be, instead I get:
Size is 4, version 0, traffic class 0, flow label 96
or with the field order inverted
Size is 4, version 0, traffic class 6, flow label 0
Can anyone explain why this happens?
With bitfields, it's implementation dependent how they are laid out. You're better off declaring a 32 bit field for the start of the packet and using bit shifting to extract the relevant fields.
uint8_t ipver = data[0] >> 4;
uint8_t tclass = ((data[0] & 0xf) << 4) | (data[1] >> 4);
uint32_t flowlbl = (((uint32_t)data[1] & 0xf) << 16) | ((uint32_t)data[2] << 8) | data[3];
Indeed, even the Linux netinet/ip6.h header doesn't use a bit field for the ipv6 header:
struct ip6_hdr
{
union
{
struct ip6_hdrctl
{
uint32_t ip6_un1_flow; /* 4 bits version, 8 bits TC,
20 bits flow-ID */
uint16_t ip6_un1_plen; /* payload length */
uint8_t ip6_un1_nxt; /* next header */
uint8_t ip6_un1_hlim; /* hop limit */
} ip6_un1;
uint8_t ip6_un2_vfc; /* 4 bits version, top 4 bits tclass */
} ip6_ctlun;
struct in6_addr ip6_src; /* source address */
struct in6_addr ip6_dst; /* destination address */
};
I'm new to C and driver programming. Currently, I'm programming a user space driver to communicate with RS232 over USB using Debian. While researching, I came across the following bit of code.
tty.c_cflag &= ~PARENB; // No Parity
tty.c_cflag &= ~CSTOPB; // 1 Stop Bit
tty.c_cflag &= ~CSIZE;
tty.c_cflag |= CS8; // 8 Bits
I understand the consequences of these lines, however, these operations would only make sense if each control flag constant (PARENB, CSTOPB, etc.) was the same length of a combination of these flags. I can't seem to verify this through any documentation (one of my main grievances with C thus far, somewhat harder to find easy to understand documentation.) to confirm this.
I would like to ensure that I'm understanding the program correctly, as it's a purely inductive approach and I'm unsure as to why these flags would be stored as such. Could somebody verify these findings, or point out something I may be overlooking?
Ex.
tty.c_cflag hypothetically is 4-bits long, each of the flags from the
previous code block corresponding to bits 3, 2, 1, 0. Then I believe the
following is how these are stored, if we were to say flags PARENB (3) and
CSTOPB (2) are high, and the other two flags are disabled.
tty.c_cflag = 1100
PARENB = 1000
CSTOPB = 0100
CSIZE = 0000
CS8 = 0000
In C, the best documentation you'll ever find is the source code itself, which you can find on your computer at /usr/include/termios.h (actually spread over one or more of the includes within it) — here's the bsd based termios.h for apples I based my answer on, values are likely to change depending on your flavour of Unix.
There, you'll find out that your tty object is of type struct termios, defined as follows:
struct termios {
tcflag_t c_iflag; /* input flags */
tcflag_t c_oflag; /* output flags */
tcflag_t c_cflag; /* control flags */
tcflag_t c_lflag; /* local flags */
cc_t c_cc[NCCS]; /* control chars */
speed_t c_ispeed; /* input speed */
speed_t c_ospeed; /* output speed */
};
So c_cflag is of type tcflag_t, which is defined by the following line:
typedef unsigned long tcflag_t;
And an unsigned long is expected to be 4 bytes, i.e. 32bits.
Then all the flags you're using in your example are being defined as follows; using 8 bytes values:
#define PARENB 0x00001000 /* parity enable */
#define CSTOPB 0x00000400 /* send 2 stop bits */
#define CSIZE 0x00000300 /* character size mask */
#define CS8 0x00000300 /* 8 bits */
That being said, the way it works is that c_cflag is used as a bit array, meaning that each bit is significant for a function. This is a method commonly used because bit operations are "cheap" in processing power (your CPU can do a bit operation in one cycle), and "cheap" in memory space, as instead of using an array of 32 booleans to store values (a boolean type having a size of 1 byte to store one binary value), you're able to store 8 binary values per byte.
Another advantage, and optimization, is that because your CPU is at least 32-bits, and likely to be 64-bits in 2015, it can apply a mask over the 32 values in one CPU cycle.
An alternative representation of the bitmask would be to create a struct like the following:
struct tcflag_t {
bool cignore;
uint8_t csize;
bool cstopb;
bool cread;
bool parenb;
bool hupcl;
bool clocal;
bool ccts_oflow;
bool crts_iflow;
bool cdtr_iflow;
bool ctdr_oflow;
bool ccar_oflow;
};
Which would be 12 bytes. And to change them, you'd have to do 12 operations.
Then the operations you can do on bytes follows the boolean logic, which is defined by truth tables:
The And (&), Or (|) and Not (~) truth tables:
| a | b | & | | a | b | | | | a | ~ |
| - | - | - | | - | - | - | | 0 | 1 |
| 0 | 0 | 0 | | 0 | 0 | 0 | | 1 | 0 |
| 0 | 1 | 0 | | 0 | 1 | 1 |
| 1 | 0 | 0 | | 1 | 0 | 1 |
| 1 | 1 | 1 | | 1 | 1 | 1 |
We usually nickname the And operator as "force to zero" and the Or operator as "force to 1", because
unless both values are 1, the And will result in 0, and unless both values are 0, the Or will
result in 1.
So if we consider that tty.c_cflag = 0x00000000 and you want to enable the parity check:
tty.c_cflag |= PARENB;
and then tty.c_cflag will contain 0x00001000, i.e. 0b1000000000000
Then you want to setup 7 bit size:
tty.c_cflag |= CS7;
and tty.c_cflag will contain 0x00001200, i.e. 0b1001000000000
Now, let's get back to your question: your "equivalent" example is not really representative, as you're considering CSIZE and CS8 to contain no value.
So let's get through the code you've taken from the example:
tty.c_cflag &= ~PARENB; // No Parity
tty.c_cflag &= ~CSTOPB; // 1 Stop Bit
tty.c_cflag &= ~CSIZE;
tty.c_cflag |= CS8; // 8 Bits
Here, tty.c_cflag contains an unknown value:
0b????????????????????????????????
And you know you want no parity, one stop bit, and a data size of 8 bits. So here you're
negating the "set parity" value to turn it off:
~PARENB == 0b0111111111111
And then using the And operator, you're forcing the bit to zero:
tty.c_cflag &= ~PARENB —→ 0b???????????????????0????????????
Then you do the same with CSTOPB:
tty.c_cflag &= ~CSTOPB —→ 0b???????????????????0?0??????????
and finally CSIZE:
tty.c_cflag &= ~CSIZE —→ 0b???????????????????0?000????????
For CSIZE, the goal is to make sure the two bit values for the length of data is reset.
Then you set up the right length by forcing to 1 the value:
tty.c_cflag |= CS8 —→ 0b???????????????????0?011????????
Actually, resetting CSIZE to 00 and then setting up CS8 to 11 is useless, as
doing directly tty.c_cflag |= CS8 will make it 11. But this is good practice in case
you want to change from CS8 to CS7, which will then set only one of the two bits, the
other one staying at the original value.
Finally, when you'll open your serial port, the library will check for those values to
configure the port, and use defaults for all the other values you haven't forced and
you'll be able to use your serial port.
I hope my explanation is helping you to better understand what's going on with flag settings
on the serial port, and the use of bitmasks altogether. FYI, the same principle is being
used for a lot of other things, like for example IPv4 netmasks, file I/O, etc.
The actual values of the macros depend on the platform (e.g., on Linux CSTOPB is defined as 0100 whereas on some BSDs it is 02000). This is why you should not make assumptions as to their exact values.
For instance, it is indeed common that CSIZE and CS8 have the same value, but on some platforms they might not, hence you first AND with the complement of the CSIZE mask (which sets all of the bits that affect character size to zero), and then OR in the value for those bits. If you were to assume that CS8 is the same pattern as the mask, you could omit the first step, but then the code would do the wrong thing, and in a very obscure manner without any warning, on a platform where this assumption didn't hold.
Here PARENB and CSTOPB are individual bit flags (exactly one 1-bit) that can be set with the bitwise-OR |, and cleared by bitwise-ANDing & their complement ~. Meanwhile the character sizes, including CS8, can have any number of 1-bits, including zero – they are more like little integers stored in specific bits of a larger integer. CSIZE is a mask that has 1-bits in all the places that signify character size (any of CS5, CS6, CS7, CS8) – this mask can be used to either extract exactly the character size (e.g., to test if ((tty.c_flag & CSIZE) == CS8)), or to clear it before setting (as is the case here with tty.c_flag &= ~CSIZE).