I have inherited a very ancient (start of 2000s) Code base that consists of a driver and a kernel module and it crashes for some combinations of compiler flags/architectures. At some point an offset is incorrect for an unknown reason and when I skimmed the code base I found a lot of unaligned access and wanted to clean up this part to remove these potential issues. Example:
uint32_t* p = ....; // <-- odd address on right side.
uint32_t b = *p + 1;
I know that current processors handle such cases without noticeable delays, and I had problems finding hardware that would trigger a BUS error (I once had a RPi 1 that would produce bus errors when setting /proc/cpu/alignment to 4). I wanted to make sure the compiler generates code that correctly access the memory. My first draft for reading a 32 bit integer and converting it from network to host byte order (there was already a macro for that) for an unaligned address was thus (dest and src are pointers to some memory):
#define SWAP32(dest, src) \
do { \
uint32_t tmp_ = 0; \
memcpy(&tmp_, (const char*)(src), sizeof(tmp_)); \
tmp_ = bswap_32(tmp_); \
memcpy((char*)(dest), &tmp_, sizeof(tmp_)); \
} while(0)
I then wanted to create a macro to wrap pointer accesses for other cases that were not yet wrapped in a macro and came up with the following solution (sample contains code to make it compile):
#include <stdint.h>
#include <stdio.h>
struct __attribute__((packed)) unaligned_fix_32 { uint32_t n; };
#define GET32(x) ((struct unaligned_fix_32*)(x))->n
int main(int argc, char* argv[])
{
char mem[] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 };
uint32_t n = GET32(&mem[1]);
printf("0x%8.8x\n", n);
}
My goal is to wrap pointer accesses inside this macro everywhere. The idea is that, knowing the struct is packed, the compiler will generate code that accesses memory in a way that avoids a potential bus error on architectures that cannot access unaligned memory.
EDIT: I changed my code to the following new version:
static inline uint32_t get_be_unaligned_u32(const void* p)
{
uint32_t u32;
memcpy(&u32, (const char*)p, sizeof(u32));
return bswap_32(u32);
}
#define GET32(src) get_be_unaligned_u32(src)
Please answer questions also for the new version:
Is this a workable solution on real hardware? Can anyone check this? Using -fsanitize=alignment produces the proper exceptions without the attribute packed, but the generated assembler simply checks addresses with & 0x3 and raises a trap if it is not zero.
Is this undefined behavior?
Are there better/other methods to detect/fix/simulate unaligned access inside a C/C++ program?
Can I somehow do this using QEmu?
Is the SWAP32 macro UB?
Related
I am writing a macro PAGE_ALIGN in C as follows.
#include <stdio.h>
#define PAGE_ALIGN(x) ((x) & 0xfffff000U)
int main() {
printf("0x%08x\n", PAGE_ALIGN(0x12345678U));
}
Now I want to make sure that the user is passing in the correct type to PAGE_ALIGN. I use _Static_assert() to perform the assertion, and things work well.
#define PAGE_ALIGN(x) (({_Static_assert(sizeof(x) == 4); x;}) & 0xfffff000U)
However, some of my code does not use PAGE_ALIGN in functions. For example, when defining a global variable:
char a[PAGE_ALIGN(0x12345678U)];
Then I get the compile error
a.c:3:24: error: braced-group within expression allowed only inside a function
3 | #define PAGE_ALIGN(x) (({_Static_assert(sizeof(x) == 4); x;}) & 0xfffff000U)
| ^
a.c:5:8: note: in expansion of macro 'PAGE_ALIGN'
5 | char a[PAGE_ALIGN(0x12345678U)];
| ^~~~~~~~~~
Is there a way to define PAGE_ALIGN such that the macro works outside of a function?
Complete a.c:
#include <stdio.h>
#define PAGE_ALIGN(x) (({_Static_assert(sizeof(x) == 4); x;}) & 0xfffff000U)
char a[PAGE_ALIGN(0x12345678U)];
int main() {
printf("0x%08x\n", PAGE_ALIGN(0x12345678U));
}
Update: here is the motivation for this question.
I am writing an OS that deals with physical address and virtual address in 32-bit and 64-bit mode. Virtual addresses are 32 bits in 32-bit mode and 64 bits in 64-bit mode. So unsigned long is used for virtual addresses. Physical addresses are always 64 bits, so unsigned long long is used. I am writing a header file that distinguishes these different types
For example, page size macros are:
#define VA_PAGE_SIZE_4K 0x1000UL // VA for virtual address
#define PA_PAGE_SIZE_4K 0x1000ULL // PA for physical address
Then I define macros for aligning up:
#define VA_PAGE_ALIGN_UP_4K(x) (((x) + VA_PAGE_SIZE_4K - 1) & ~(VA_PAGE_SIZE_4K - 1))
#define PA_PAGE_ALIGN_UP_4K(x) (((x) + PA_PAGE_SIZE_4K - 1) & ~(PA_PAGE_SIZE_4K - 1))
Please imagine that there are other macros, such as VA_PAGE_ALIGN_UP_2M, VA_PAGE_ALIGN_UP_1G, ...
My OS has a fixed supported physical address size (say 4 GiB) and I want to need an identity map page table. So I can use the alignment macros to compute how many page table entries I need to support. Of course, there are page directories etc. due to multi-level paging.
// maximum physical memory, 4G
#define MAX_PHYS_MEM 0x100000000ULL
// number of page table entries
#define PAGE_TABLE_NELEMS (PA_PAGE_ALIGN_UP_4K(MAX_PHYS_MEM) / PA_PAGE_SIZE_4K)
// Define page table (global variable)
unsigned long page_table[PAGE_TABLE_NELEMS];
// number of page directory entries
#define PAGE_DIRECTORY_NELEMS (PA_PAGE_ALIGN_UP_2M(MAX_PHYS_MEM) / PA_PAGE_SIZE_2M)
// Define page directory (global variable)
unsigned long page_directory[PAGE_DIRECTORY_NELEMS];
// ...
However, in other code I want to make sure that PA_PAGE_ALIGN_UP_4K and VA_PAGE_ALIGN_UP_4K are not mixed. That is, PA_PAGE_ALIGN_UP_4K only used on unsigned long long and VA_PAGE_ALIGN_UP_4K only used on unsigned long. So I want to add a static assert in those macros. But adding a static assert will cause compile errors in page_table and page_directory above.
#Tavian Barnes's comment solves my problem. My updated program looks like:
#include <stdio.h>
#define PAGE_ALIGN(x) (sizeof(struct { _Static_assert(sizeof(x) == 4); int dummy; }) * 0 + ((x) & 0xfffff000U))
char a[PAGE_ALIGN(0x12345678U)];
int main() {
printf("0x%08x\n", PAGE_ALIGN(0x12345678U));
}
Suppose that I have the following definitions:
#include <stdbool.h>
#include <stdint.h>
#define ASSERT(cond) _Static_assert(cond, #cond)
typedef union {
struct {
bool bit0:1;
bool bit1:1;
bool bit2:1;
bool bit3:1;
bool bit4:1;
bool bit5:1;
bool bit6:1;
bool bit7:1;
};
uint8_t bits;
} byte;
ASSERT(sizeof(byte) == sizeof(uint8_t));
Is it possible to write a code, such as
#include <assert.h>
// ...
assert(((byte) { .bit0 = 1 }).bits == 0b00000001);
assert(((byte) { .bit1 = 1 }).bits == 0b00000010);
assert(((byte) { .bit2 = 1 }).bits == 0b00000100);
assert(((byte) { .bit3 = 1 }).bits == 0b00001000);
assert(((byte) { .bit4 = 1 }).bits == 0b00010000);
assert(((byte) { .bit5 = 1 }).bits == 0b00100000);
assert(((byte) { .bit6 = 1 }).bits == 0b01000000);
assert(((byte) { .bit7 = 1 }).bits == 0b10000000);
// ...
that would cause a compile-time failure if the above conditions weren't satisfied?
(When I try to place the conditions in the ASSERT macro, the compiler complains that expression in static assertion is not constant, which of course makes perfect sense)
The solution is allowed to use the GNU extensions to the C language.
I don't think you can.
_Static_assert is required to verify that the argument expression satisfies standard C's requirements for an integer constant expression.
There are ways, which on gcc can sometimes turn a boolean expression that doesn't satisfy those requirements but are compile-time-known to the optimizer into a compile-time error or warning.
E.g., :
#include <assert.h>
#if __GNUC__ && !__clang__
#define $SassertIfUCan0(X) \
(__extension__({ /*ellicit a -Wvla-larger-than */ \
(!__builtin_constant_p(X)) ? 0 : \
({ char volatile $SassertIfUCan0_[ (!__builtin_constant_p(X)||(X)) ? 1:-1]; \
$SassertIfUCan0_[0]=0,0;}); \
__auto_type $SassertIfUCan0 = X; \
assert($SassertIfUCan0); \
0; \
}))
#endif
int main(int C, char **V)
{
int x = 0; $SassertIfUCan0(x);
//these also ellicit compile-time errrors:
/*$SassertIfUCan0(C-C);*/
/*$SassertIfUCan0(C*0);*/
}
can turn the nullness of the compile-time known variable x, which isn't technically an integer constant, into a compile/time warning/error
("-Wvla-larger-than").
Unfortunately, the macro doesn't work with every expression and that includes your bitfield-based example.
(I wish compilers had a mechanism for failing compilation if an expression happens to be compile-time known and false.)
So AFAIK, the closest thing you can do is compile-time detect platforms whose ABI is known to guarantee your required bitfield layout:
#if __linux__ && __x86_64__
#elif 0//...
//...
#else
#error "bitfields not known to be little-endian"
#endif
I think, this is an X-Y problem: You are asking about checking the layout of bitfields when you really want to write code that is portable across different implementations of bitfields. So:
If you don't try to communicate your bitfield to another machine, or store it in a file where a different machine may read it, just forget about the implementation detail of how the bits are ordered. Just access them via the bitfield names, and be done with it.
If you need to communicate the structures containing these bitfields, declare a uint8_t and the appropriate set of bit flag constants (#define BIT7 (1u << 7), etc.). Bytes never change value when they are transferred from one machine to another, so myFlags & BIT7 is guaranteed to yield the same result everywhere.
Note that it is important to either use a single byte to store the flags, or handle the problem of endianess explicitly.
On GNU Linux, you might find <features.h> and /usr/x86_64-linux-gnu/include/linux/byteorder/big_endian.h
The solution is allowed to use the GNU extensions to the C language.
With most recent GCC compilers, you could provide your own GCC plugin defining your __your_builtin_endian__ compiler builtin. Notice that some GCC compilers are built without plugin support (e.g. RedHat did that). Check by running gcc -v alone.
Once your plugin defines __your_builtin_endian__, you could use that in static_assert. Or have your plugin define and implement some #pragma MYPLUGIN check endian which would make a compile-time error in some cases.
Do budget a few weeks of fulltime work for such a plugin. It is GCC version specific (not always the same C++ code for a GCC 9 and GCC 10 plugin).
Consider also using autoconf (at least if you do not need any cross-compilation).
Not strickly within the limits of the question, but may provide for more portable solution.
The static assertion has limits to the expressions, and it will NOT be able to evaluate expression from a union.
As an alternative, and assuming the code will be built by a makefile (or equivalent), consider adding a a step to the build to force the condition
static.verify: static_check.c
cc -o static_check static_check.c
./static_check
touch $#
# Make the static.verify dependency for building objects/executable.
a.o: static.verify
Basically, making it a requirement to run the small program 'static_check.c'. The program can produce any required error message. Should exit with non-zero return status to indicate an error.
When I compile my code for my STM32F429 CPU everything works fine when I use the -O0 flag but as soon as I use a higher optimization (-O1, -O2, and -O3) the code breaks.
I'm using the CMSIS+HAL libraries from ST and some basic code.
The problem is that even though *uart_irq is defined as volatile the if (uart_irq && uart_irq->SOURCE == IRQ_SOURCE_UART) in the main loop is never evaluated.
I have tried to define uart_irq as a volatile void * without success.
The only thing that work is if uart_irq is defined as a volatile uint32_t and the integer is cast to a irq_instance when used as the compiler wont remove that during optimization.
I would be happy if anybody would shed some light on the problem.
Is this supposed to be the standard behavior?
Is this a known bug in the compiler?
main.h
#define API_COMMAND_SIZE 6
typedef struct irq_instance_s
{
uint8_t SOURCE;
uint8_t TYPE;
uint8_t *CONTEXT;
uint8_t SIZE;
} irq_instance;
extern volatile irq_instance *uart_irq;
main.c
The receive pointer is freed inside hande_command
#include "main.h
volatile irq_instance *uart_irq = 0;
int main(void)
{
uint8_t *receive = 0;
<Initialize stuff>
/* Initialize first UART recieve */
receive = malloc(API_COMMAND_SIZE);
while (HAL_UART_Receive_IT(&huart1, receive, API_COMMAND_SIZE) == HAL_BUSY);
/* Program Main loop */
while(1) {
if (uart_irq && uart_irq->SOURCE == IRQ_SOURCE_UART) { /* <---- Problem is here */
handle_interrupt(uart_irq);
free((void *)uart_irq);
uart_irq = 0;
}
}
}
stm32f4xx_it.c
The HAL_UART_RxCpltCallback are called after each successful UART receive.
#include "main.h"
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
uint8_t *receive = 0;
uart_irq = calloc(1, sizeof(irq_instance));
uart_irq->SOURCE = IRQ_SOURCE_UART;
uart_irq->CONTEXT = huart->pRxBuffPtr - huart->RxXferSize;
uart_irq->SIZE = huart->RxXferSize;
uart_irq->TYPE = IRQ_TYPE_COMMAND;
receive = malloc(API_COMMAND_SIZE);
while (HAL_UART_Receive_IT(&huart1, receive, API_COMMAND_SIZE) == HAL_BUSY);
}
volatile irq_instance *uart_irq
says that the thing pointed to by uart_irq is volatile. But
if (uart_irq && uart_irq->SOURCE == IRQ_SOURCE_UART)
is looking at the pointer itself, not the thing that's being pointed to. If the pointer itself is also volatile, which looking at your code it is, then declare it like this:
volatile irq_instance * volatile uart_irq
If I read this correctly then you are changing the uart_irq inside the IRQ handler and it's this change of pointer value that the optimizer cannot see in the regular program flow and hence is optimizing away.
The correct declaration for a volatile pointer is irq_instance * volatile uart_irq. The way that you have declared it tells gcc that the value pointed to by the pointer is volatile. If this is also the case then you may combine the two with volatile irq_instance * volatile uart_irq.
The variable uart_irq is volatile, not just what it points to... Define it this way:
extern irq_instance * volatile uart_irq;
Is it possible to create a preprocessor function that will cause multiple other preoprocessor macros to be defined?
I'm working in a micro controller framework that requires a few macros to be made in order for a generic interrupt handler to function:
<MODULE_NAME>_IRQ_PIN //ex: PORTB_PIN(0)
<MODULE_NAME>_IRQ_IN_REGISTER //ex: GPIO_PBIN
<MODULE_NAME>_IRQ_NUMBER //ex: GPIO_IRQA
<MODULE_NAME>_IRQ_INTCFG_REG //ex: GPIO_INTCFGA
I am trying to make this process more generic and easier from an implementation standpoint. There are about ten of these macros that need to be defined, but their definitions can all be derived when given 1) the port name 2) the pin number and 3) the IRQ name. I am hoping then to create a pre-processor function that will result in the generation of all of these macros. Something like:
#define MAKE_INTERRUPT_MACROS(module, port, pin, irq_num) \
#define module##_IRQ_pin PORT##port##_PIN(##pin##) \
#define module##_IRQ_IN_REGISTER GPIO_P##port##IN \
#define module##_IRQ_NUMBER GPIO_IRQ##irq_num \
#define module##_IRQ_INTCFG_REG GPIO_INTCFG##irq_num
Is there a legal way to get the proprocessor to do something like the above, where a single preprocessor function causes the generation of multiple other macros based on the parameters passed to the function?
I think this classical scheme may solve your problem. This is a simple and clear way:
#ifdef CPU_X
#define IRQ_PIN 0
#define IRQ_IN_REGISTER 3
#define IRQ_NUMBER 11
#define IRQ_INTCFG_REG 12
#endif
#ifdef CPU_YY
#define IRQ_PIN PORTB_PIN(1)
#define IRQ_IN_REGISTER GPIO_PBIN(6)
#define IRQ_NUMBER GPIO_IRQA(9)
#define IRQ_INTCFG_REG GPIO_INTCFGA(0xA)
#endif
#ifdef CPU_KK
/* .
. Another CPU
.
*/
#endif
#ifdef CPU_K2
/* .
. Another CPU
.
*/
#endif
You may compile the code specifying the CPU using -D CPU_xx and the problem shoudl be solved!
I assume you might have some other macros (E.G.: GPIO_IRQA(9)), and in CPU_YY I've used it, but It might be used also for the other CPUs.
If you can use C++ rather than C, look at using classes, one per CPU type, and simply use constants and interfaces in the class. Then, you don't even care that they are different, simply use the same names to access them (the differentiation is done based upon the class being instantiated.
If you really and truly must use C (such as writing a device driver), you can use the approach device driver writers use (all flavors of *nix, VxWorks, PSOS, QNX, and most of the old DEC OSs use this approach, don't know about Windows): Simply build a structure containing the values and any functions you may need to manipulate the hardware (or anything else, for that matter). Create one instance of this structure per hardware (or in your case, module) type. Then indirect through the structure.
Example:
struct module_wrapper {
const char *module_name;
int irq_pin;
int irq_register;
int irq_number;
int irq_intcfg_reg;
int (*init_fcn)(void);
int (*reg_access)(int register_number);
int (*open)(void);
int (*close)(void);
int (*read)(char *dst_buffer, int len);
int (*write)(const char *src_buffer, int len);
};
module_wrapper portB = { /* initialize here */ };
module_wrapper gpio = { /* initialize here */ };
printf("GPIO pin %d\n", gpio.irq_pin);
Obviously, modify as desired. You can also replace the constant variables with functions that return the values.
You can't define other macros with a macro, but you achieve something similar by doing it kind of in a totally opposite way.
You could autogenerate a file which has the following block for each possible module:
#ifdef <MODULE>_IRQ_DATA
#define <MODULE>_IRQ_pin CALL(GET_IRQ_PIN, <MODULE>_IRQ_DATA)
#define <MODULE>_IRQ_IN_REGISTER CALL(GET_IRQ_IN_REGISTER, <MODULE>_IRQ_DATA)
#define <MODULE>_IRQ_NUMBER CALL(GET_IRQ_NUMBER, <MODULE>_IRQ_DATA)
#define <MODULE>_IRQ_INTCFG_REG CALL(GET_IRQ_INTCFG_REG, <MODULE>_IRQ_DATA)
#endif
And then have:
#define CALL(MACRO, ...) MACRO(__VA_ARGS__)
#define GET_IRQ_PIN(port, pin, irq_num) PORT##port##_PIN(pin)
#define GET_IRQ_IN_REGISTER(port, pin, irq_num) GPIO_P##port##IN
#define GET_IRQ_NUMBER(port, pin, irq_num) GPIO_IRQ##irq_num
#define GET_IRQ_INTCFG_REG(port, pin, irq_num) GPIO_INTCFG##irq_num
(Depending on how the defines are used, you can possibly get rid of the #ifdef-#endif -pairs, eg. if all of them must/can always be defined)
Then actually defining the needed values could be done with just:
#define <MODULE>_IRQ_DATA B,0,A
Ok, so how I understand #include works is by looking up what you are including and complies that and replaces it where the include is. however, when I assume this, my program doesn't compile and it gives __ is undefined all over the place.
for example when in my main.c will have something like
#include "tim.h"
#include "tim_cfg.h"
#include "tim_api.h"
tim.h contains some typesdefs like
typedef enum
{
RATE_DIV1 = 0X0,
RATE_DIV2 = 0X1,
RATE_DIV3 = 0X2,
RATE_DIV4 = 0X3,
RATE_DIV5 = 0X4,
RATE_DIV6 = 0X5,
RATE_DIV7 = 0X6,
RATE_DIV8 = 0X7,
RATE_DIV9 = 0X8
} BaseRate_T;
typedef unsigned char byte;
tim_cfg.h contains register locations and basic structs
typedef struct
{
byte TimerSize;
byte InterruptLevel;
} TIMInfo_T;
and tim_api.h contains the function prototypes of the tim functions
So, the problem is why do I get errors
identifier "byte" is undefined
When it the first thing I include?
You should follow the rule that every header file should include what it needs to work.
With the setup you have, if someone includes tim_cfg.h on its own, the byte type is not defined.
A better solution would be:
tim_cfg.h:
#include "tim.h"
typedef struct
{
byte TimerSize;
byte InterruptLevel;
} TIMInfo_T;
That way, everything that is needed for tim.cfg.h is there when you include it.