How to eliminate unused elements in structures? - c

Background
I am working on a modular architecture for embedded devices where different abstraction layers have to communicate with each others. The current approach is to have plenty of functions, variables and defines mangled with the module name they belong to.
This approach is a bit painful and I would like to define some kind of common interfaces for the API. The key idea is to get a better understanding of what modules share the same HAL interface.
Proposal
I would like to use a OOP inspired architecture where I use structures as interfaces. All these structures are populated statically.
This solution looks nice but I might waste a lot of memory because the compiler doesn't know how to chop off structures and only keep what it really needs to.
The following example can be built with or without -DDIRECT and the behavior should be exactly the same.
Example
Source (test.c)
#include <stdlib.h>
int do_foo(int a) {
return 42 * a;
}
#ifdef DIRECT
int (*foo)(int) = do_foo;
int (*bar)(int);
int storage;
#else
struct foo_ts {
int (*do_foo)(int);
int (*do_bar)(int);
int storage;
} foo = {.do_foo = do_foo};
#endif
int main(char argc) {
#ifdef DIRECT
return foo(argc);
#else
return foo.do_foo(argc);
#endif
}
Makefile
CFLAGS=-O2 -mthumb -mcpu=cortex-m3 -g --specs=nosys.specs
CC=arm-none-eabi-gcc
upper=$(shell echo $(1) | tr a-z A-Z)
EXEC=direct.out with_struct.out
all: $(EXEC)
%.out:test.c
$(CC) $(CFLAGS) -D$(call upper,$(basename $#)) -o $# test.c
size $#
Output
One can notice that the memory footprint used with the struct variant is bigger because the compiler doesn't allow itself to remove the unused members.
arm-none-eabi-gcc -O2 -mthumb -mcpu=cortex-m3 -g \
--specs=nosys.specs -DDIRECT -o direct.out test.c
size direct.out
text data bss dec hex filename
992 1092 36 2120 848 direct.out
arm-none-eabi-gcc -O2 -mthumb -mcpu=cortex-m3 -g \
--specs=nosys.specs -DWITH_STRUCT -o with_struct.out test.c
size with_struct.out
text data bss dec hex filename
992 1100 28 2120 848 with_struct.out
Question
With this example I demonstrate that using a structure is good for readability and modularity, but it could decrease efficiency and it increases the memory usage.
Is there a way to get the advantages of both solutions? Said differently, is there a way to tell the compiler to be smarter?
OOP?
Following the comments on this question, one suggestion is to use C++ instead. Unfortunately, the same issue would occur because a Class with unused members will never be simplified by the compiler. So I am falling in the same trap with both languages.
Another raised point was the reason for unused members in structures. To address this question we can imagine a generic 3-axes accelerometer used in an application where only 1 axis is used. The HAL for this accelerometer could have the methods read_x_acc, read_y_acc and read_z_acc while only read_x_acc is used by the application.
If I declare a class in C++ or a structure in C the function pointers for unused methods/functions will still consume memory for nothing.

Let me first show you a possible approach regarding your edit, where your current interface has three functions, but sometimes you would need only one of them. You could define two interfaces:
typedef struct I1DAccel
{
double (*read)(void);
} I1DAccel;
typedef struct I3DAccel
{
union
{
I1DAccel x;
struct
{
double (*read_x)(void);
};
};
union
{
I1DAccel y;
struct
{
double (*read_y)(void);
};
};
union
{
I1DAccel z;
struct
{
double (*read_z)(void);
};
};
} I3DAccel;
Then you can make the implementation of the Accelerometer do this:
I1DAccel accel_x = { read_x_acc };
I1DAccel accel_y = { read_y_acc };
I1DAccel accel_z = { read_z_acc };
I3DAccel accel =
{
.read_x = read_x_acc,
.read_y = read_y_acc,
.read_z = read_z_acc,
};
With proper link-time optimizations, the compiler could then throw away any of these global structs that isn't used by application code.
Of course, you will consume more memory if one part of your code requires only accel_x while another part requires the whole accel. You would have to hunt down these cases manually.
Of course, using "interface" structs, you will always consume more memory than without them, the pointers have to be stored somewhere. Therefore, the typical approach is indeed to just prepend the module name to the functions and call them directly, like e.g. in case of such an accelerometer:
typedef int acchndl; // if you need more than one accelerometer
double Accel_read_x(acchndl accel);
double Accel_read_y(acchndl accel);
double Accel_read_z(acchndl accel);
This is conceptually similar to what you would do in e.g. C++:
class Accel
{
double read_x();
double read_y();
double read_z();
};
double Accel::read_x()
{
// implementation here
}
With the plain C above, instead of an instance pointer, you can use any other type of "object handle", like demonstrated with the typedef to int, which is often an advantage for embedded code.

This solution looks nice but I might waste a lot of memory because the compiler doesn't know how to chop off structures and only keep what it really needs to.
...
Is there a way to get the advantages of both solutions? Said differently, is there a way to tell the compiler to be smarter?
One problem is 'C' compiles things as a module (compilation unit) and there is often no easy way to know what will and will not be used in a structure. Consider the structure can be passed from module to module as you propose. When compiling one module, there is no context/information to know that some other module will use the structure or not.
Okay, so that leads to some possible solutions. gcc has the option -fwhole-program See: -fipa-struct-reorg as well as gold and LTO. Here you must structure your makefile so that the compiler (code generator) has all of the information available to know if it may remove structure members. This is not to say that these options will do what you want with a current gcc, just that they are a requirement to get things to work.
C++
You can do most anything in 'C' that you can in C++; just not as programmer efficiently. See: Operator overloading in 'C'. So how might a 'C' compiler implement your virtual functions? Your C++ virtuals might look like this underneath,
struct foo_ts_virtual_table {
int my_type /* RTTI value for polymorphism. */
int (*do_foo)(int);
int (*do_bar)(int);
} foo_vtable = {.my_type = ENUM_FOO_TS; .do_foo = foo};
struct foo_ts {
void * vtable; /* a foo_ts_virtual_table pointer */
int storage;
} foo = {.vtable = foo_vtable};
This is a win memory wise if you have multiple foo_ts structures. A down side of this is that a function pointer is difficult to in-line. Often a simple functions body maybe less than the call overhead on an ARM CPU. This results in more code memory and slower execution.
C++ compilers will be geared to eliminate these extraneous functions, just because this is a common issue in C++. A 'C' compilers analysis is confounded by your custom code/notation and the language definition of structure and compilation units.
Other possibilities are to wrap the function calls in macros and emit data to a non-linked section. You can examine the non-linked section to see whether or not to include a function (and pointer) in the final link. What is a win or not depends on lots of different design objectives. It is certainly going to complicate the build process and confound other developers.
As well, the ARM Linux MULTI_CPU maybe of interest. This is only to eliminate function pointers if only one 'type' is needed at run time.
If you insist on function pointers, there will be occasions where this actually generates more code. C++ compilers will do book keeping to see when a virtual might be in-lined as well as possibly not emitting unused member functions in a virtual table (and the implementation of the function). These issues maybe more pronounced than the extra structure overhead.

Related

How to tell gcc to not align function parameters on the stack?

I am trying to decompile an executable for the 68000 processor into C code, replacing the original subroutines with C functions one by one.
The problem I faced is that I don't know how to make gcc use the calling convention that matches the one used in the original program. I need the parameters on the stack to be packed, not aligned.
Let's say we have the following function
int fun(char arg1, short arg2, int arg3) {
return arg1 + arg2 + arg3;
}
If we compile it with
gcc -m68000 -Os -fomit-frame-pointer -S source.c
we get the following output
fun:
move.b 7(%sp),%d0
ext.w %d0
move.w 10(%sp),%a0
lea (%a0,%d0.w),%a0
move.l %a0,%d0
add.l 12(%sp),%d0
rts
As we can see, the compiler assumed that parameters have addresses 7(%sp), 10(%sp) and 12(%sp):
but to work with the original program they need to have addresses 4(%sp), 5(%sp) and 7(%sp):
One possible solution is to write the function in the following way (the processor is big-endian):
int fun(int bytes4to7, int bytes8to11) {
char arg1 = bytes4to7>>24;
short arg2 = (bytes4to7>>8)&0xffff;
int arg3 = ((bytes4to7&0xff)<<24) | (bytes8to11>>8);
return arg1 + arg2 + arg3;
}
However, the code looks messy, and I was wondering: is there a way to both keep the code clean and achieve the desired result?
UPD: I made a mistake. The offsets I'm looking for are actually 5(%sp), 6(%sp) and 8(%sp) (the char-s should be aligned with the short-s, but the short-s and the int-s are still packed):
Hopefully, this doesn't change the essence of the question.
UPD 2: It turns out that the 68000 C Compiler by Sierra Systems gives the described offsets (as in UPD, with 2-byte alignment).
However, the question is about tweaking calling conventions in gcc (or perhaps another modern compiler).
Here's a way with a packed struct. I compiled it on an x86 with -m32 and got the desired offsets in the disassembly, so I think it should still work for an mc68000:
typedef struct {
char arg1;
short arg2;
int arg3;
} __attribute__((__packed__)) fun_t;
int
fun(fun_t fun)
{
return fun.arg1 + fun.arg2 + fun.arg3;
}
But, I think there's probably a still cleaner way. It would require knowing more about the other code that generates such a calling sequence. Do you have the source code for it?
Does the other code have to remain in asm? With the source, you could adjust the offsets in the asm code to be compatible with modern C ABI calling conventions.
I've been programming in C since 1981 and spent years doing mc68000 C and assembler code (for apps, kernel, device drivers), so I'm somewhat familiar with the problem space.
It's not a gcc 'fault', it is 68k architecture that requires stack to be always aligned on 2 bytes.
So there is simply no way to break 2-byte alignment on the hardware stack.
but to work with the original program they need to have addresses
4(%sp), 5(%sp) and 7(%sp):
Accessing word or long values off the ODD memory address will immediately trigger alignment exception on 68000.
To get integral parameters passed using 2 byte alignment instead of 4 byte alignment, you can change the default int size to be 16 bit by -mshort. You need to replace all int in your code by long (if you want them to be 32 bit wide). The crude way to do that is to also pass -Dint=long to your compiler. Obviously, you will break ABI compatibility to object files compiled with -mno-short (which appears to be the default for gcc).

How to merge statics for all instances of header-defined inline function?

Having a header that defines some static inline function that contains static variables in it, how to achieve merging of identical static local variables across all TUs that comprise final loadable module?. In a less abstract way:
/*
* inc.h
*/
#include <stdlib.h>
/*
* This function must be provided via header. No extra .c source
* is allowed for its definition.
*/
static inline void* getPtr() {
static void* p;
if (!p) {
p = malloc(16);
}
return p;
}
/*
* 1.c
*/
#include "inc.h"
void* foo1() {
return getPtr();
}
void* bar1() {
return getPtr();
}
/*
* 2.c
*/
#include "inc.h"
void* foo2() {
return getPtr();
}
void* bar2() {
return getPtr();
}
Platform is Linux, and this file set is built via:
$ clang -O2 -fPIC -shared 1.c 2.c
It is quite expected that both TUs receive own copies of getPtr.p. Though inside each TU getPtr.p is shared across all getPtr() instantiations. This can be confirmed by inspecting final loadable binary:
$ readelf -s --wide a.out | grep getPtr
32: 0000000000201030 8 OBJECT LOCAL DEFAULT 21 getPtr.p
34: 0000000000201038 8 OBJECT LOCAL DEFAULT 21 getPtr.p
At the same time I'm looking for a way of how to share getPtr.p across separate TU boundary. This vaguely resembles what happens with C++ template instantiations. And likely GRP_COMDAT would help me but I was not able to find any info about how to label my static var to be put into COMDAT.
Is there any attribute or other source-level (not a compiler option) way to achieve merging such objects?
If I understand correctly what you want, you can get this effect by simply declaring a global variable.
/*
* inc.h
*/
void* my_p;
static inline void* getPtr() {
if (!my_p) {
my_p = malloc(16);
}
return my_p;
}
This will use the same variable my_p for all instances of getPtr throughout the program (since it's global). And it is not necessary to have an explicit definition of my_p in any module. It will be initialized to NULL, which is just what you want. So nothing besides inc.h needs to change, and no additional .c file is needed.
Of course, you'll probably want to give my_p a name that is less likely to conflict with any identifier in the user's program. Maybe Sergios_include_file_p_for_getPtr or something of the sort.
This is actually an extension to standard C (mentioned in Annex J.5.11 in N2176), but it's provided by gcc and clang on most modern platforms. It's documented under the -fcommon compiler option (which is enabled by default). It's typically implemented by putting the variable in a common section, and the linker then merges all instances together, just as you suggest. But the code above shows how to access the feature without needing to use attributes or other obscure incantations.
If you want to be extra paranoid, you can declare my_p with __attribute__((common)) which will cause the variable to be treated in this way even if -fno-common is in effect. (Of course, that may cause trouble if -fno-common was being used for a reason...)

Why is no warning given for the wrong use of __attribute__((pure)) in GCC?

I am trying to understand pure functions, and have been reading through the Wikipedia article on that topic. I wrote the minimal sample program as follows:
#include <stdio.h>
static int a = 1;
static __attribute__((pure)) int pure_function(int x, int y)
{
return x + y;
}
static __attribute__((pure)) int impure_function(int x, int y)
{
a++;
return x + y;
}
int main(void)
{
printf("pure_function(0, 0) = %d\n", pure_function(0, 0));
printf("impure_function(0, 0) = %d\n", impure_function(0, 0));
return 0;
}
I compiled this program with gcc -O2 -Wall -Wextra, expecting that an error, or at least a warning, should have been issued for decorating impure_function() with __attribute__((pure)). However, I received no warnings or errors, and the program also ran without issues.
Isn't marking impure_function() with __attribute__((pure)) incorrect? If so, why does it compile without any errors or warnings, even with the -Wextra and -Wall flags?
Thanks in advance!
Doing this is incorrect and you are responsible for using the attribute correctly.
Look at this example:
static __attribute__((pure)) int impure_function(int x, int y)
{
extern int a;
a++;
return x + y;
}
void caller()
{
impure_function(1, 1);
}
Code generated by GCC (with -O1) for the function caller is:
caller():
ret
As you can see, the impure_function call was completely removed because compiler treats it as "pure".
GCC can mark the function as "pure" internally automatically if it sees its definition:
static __attribute__((noinline)) int pure_function(int x, int y)
{
return x + y;
}
void caller()
{
pure_function(1, 1);
}
Generated code:
caller():
ret
So there is no point in using this attribute on functions that are visible to the compiler. It is supposed to be used when definition is not available, for example when function is defined in another DLL. That means that when it is used in a proper place the compiler won't be able to perform a sanity check anyway. Implementing a warning thus is not very useful (although not meaningless).
I don't think there is anything stopping GCC developers from implementing such warning, except time that must be spend.
A pure function is a hint for the optimizing compiler. Probably, gcc don't care about pure functions when you pass just -O0 to it (the default optimizations). So if f is pure (and defined outside of your translation unit, e.g. in some outside library), the GCC compiler might optimize y = f(x) + f(x); into something like
{
int tmp = f(x); /// tmp is a fresh variable, not appearing elsewhere
y = tmp + tmp;
}
but if f is not pure (which is the usual case: think of f calling printf or malloc), such an optimization is forbidden.
Standard math functions like sin or sqrt are pure (except for IEEE rounding mode craziness, see http://floating-point-gui.de/ and Fluctuat for more), and they are complex enough to compute to make such optimizations worthwhile.
You might compile your code with gcc -O2 -Wall -fdump-tree-all to guess what is happening inside the compiler. You could add the -fverbose-asm -S flags to get a generated *.s assembler file.
You could also read the Bismon draft report (notably its section ยง1.4). It might give some intuitions related to your question.
In your particular case, I am guessing that gcc is inlining your calls; and then purity matters less.
If you have time to spend, you might consider writing your own GCC plugin to make such a warning. You'll spend months in writing it! These old slides might still be useful to you, even if the details are obsolete.
At the theoretical level, be aware of Rice's theorem. A consequence of it is that perfect optimization of pure functions is probably impossible.
Be aware of the GCC Resource Center, located in Bombay.

C string literal as parameter equals -1 in avr-gcc?

I am developing a software for AVR microcontroller. Saying in fromt, now I only have LEDs and pushbuttons to debug. The problem is that if I pass a string literal into the following function:
void test_char(const char *str) {
if (str[0] == -1)
LED_PORT ^= 1 << 7; /* Test */
}
Somewhere in main()
test_char("AAAAA");
And now the LED changes state. On my x86_64 machine I wrote the same function to compare (not LED, of course), but it turns out that str[0] equals to 'A'. Why is this happening?
Update:
Not sure whether this is related, but I have a struct called button, like this:
typedef struct {
int8_t seq[BTN_SEQ_COUNT]; /* The sequence of button */
int8_t seq_count; /* The number of buttons registered */
int8_t detected; /* The detected button */
uint8_t released; /* Whether the button is released
after a hold */
} button;
button btn = {
.seq = {-1, -1, -1},
.detected = -1,
.seq_count = 0,
.released = 0
};
But it turned out that btn.seq_count start out as -1 though I defined it as 0.
Update2
For the later problem, I solved by initializing the values in a function. However, that does not explain why seq_count was set to -1 in the previous case, nor does it explain why the character in string literal equals to -1.
Update3
Back to the original problem, I added a complete mini example here, and same occurs:
void LED_on() {
PORTA = 0x00;
}
void LED_off() {
PORTA = 0xFF;
}
void port_init() {
PORTA = 0xFF;
DDRA |= 0xFF;
}
void test_char(const char* str) {
if (str[0] == -1) {
LED_on();
}
}
void main() {
port_init();
test_char("AAAAA");
while(1) {
}
}
Update 4
I am trying to follow Nominal Animal's advice, but not quite successful. Here is the code I have changed:
void test_char(const char* str) {
switch(pgm_read_byte(str++)) {
case '\0': return;
case 'A': LED_on(); break;
case 'B': LED_off(); break;
}
}
void main() {
const char* test = "ABABA";
port_init();
test_char(test);
while(1) {
}
}
I am using gcc 4.6.4,
avr-gcc -v
Using built-in specs.
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/home/carl/Softwares/AVR/libexec/gcc/avr/4.6.4/lto-wrapper
Target: avr
Configured with: ../configure --prefix=/home/carl/Softwares/AVR --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 4.6.4 (GCC)
Rewritten from scratch, to hopefully clear up some of the confusion.
First, some important background:
AVR microcontrollers have separate address spaces for RAM and ROM/Flash ("program memory").
GCC generates code that assumes all data is always in RAM. (Older versions used to have special types, such as prog_char, that referred to data in the ROM address space, but newer versions of GCC do not and cannot support such data types.)
When linking against avr-libc, the linker adds code (__do_copy_data) to copy all initialized data from program memory to RAM. If you have both avr-gcc and avr-libc packages installed, and you use something like avr-gcc -Wall -O2 -fomit-frame-pointer -mmcu=AVRTYPE source.c -o binary.elf to compile your source file into a program binary, then use avr-objcopy to convert the elf file into the format your firmware utilities support, you are linking against avr-libc.
If you use avr-gcc to only produce an object file source.o, and some other utilities to link and upload your program to your microcontroller, this copying from program memory to RAM may not happen. It depends on what linker and libraries your use.
As most AVRs have only a few dozen to few hundred bytes of RAM available, it is very, very easy to run out of RAM. I'm not certain if avr-gcc and avr-libc reliably detect when you have more initialized data than you have RAM available. If you specify any arrays containing strings, it is very likely you're already overrun your RAM, causing all sorts of interesting bugs to appear.
The avr/pgmspace.h header file is part of avr-libc, and defines a macro, PROGMEM, that can be used to specify data that will only be referred to by functions that take program memory addresses (pointers), such as pgm_read_byte() or strcmp_P() defined in the same header file. The linker will not copy such variables to RAM -- but neither will the compiler tell you if you're using them wrong.
If you use both avr-gcc and avr-libc, I recommend using the following approach for all read-only data:
#include <avr/pgmspace.h>
/*
* Define LED_init(), LED_on(), and LED_off() functions.
*/
void blinky(const char *str)
{
while (1) {
switch (pgm_read_byte(str++)) {
case '\0': return;
case 'A': LED_on(); break;
case 'B': LED_off(); break;
}
/* Add a sleep or delay here,
* or you won't be able to see the LED flicker. */
}
}
static const char example1[] PROGMEM = "AB";
const char example2[] PROGMEM = "AAAA";
int main(void)
{
static const char example3[] PROGMEM = "ABABB";
LED_init();
while (1) {
blinky(example1);
blinky(example2);
blinky(example3);
}
}
Because of changes (new limitations) in GCC internals, the PROGMEM attribute can only be used with a variable; if it refers to a type, it does nothing. Therefore, you need to specify strings as character arrays, using one of the forms above. (example1 is visible within this compilation unit only, example2 can be referred to from other compilation units too, and example3 is visible only in the function it is defined in. Here, visible refers to where you can refer to the variable; it has nothing to do with the contents.)
The PROGMEM attribute does not actually change the code GCC generates. All it does is put the contents to .progmem.data section, iff without it they'd be in .rodata. All of the magic is really in the linking, and in linked library code.
If you do not use avr-libc, then you need to be very specific with your const attributes, as they determine which section the contents will end up in. Mutable (non-const) data should end up in the .data section, while immutable (const) data ends up in .rodata section(s). Remember to read the specifiers from right to left, starting at the variable itself, separated by '*': the leftmost refers to the content, whereas the rightmost refers to the variable. In other words,
const char *s = p;
defines s so that the value of the variable can be changed, but the content it points to is immutable (unchangeable/const); whereas
char *const s = p;
defines s so that you cannot modify the variable itself, but you can the content -- the content s points to is mutable, modifiable. Furthermore,
const char *s = "literal";
defines s to point to a literal string (and you can modify s, ie. make it point to some other literal string for example), but you cannot modify the contents; and
char s[] = "string";
defines s to be a character array (of length 6; string length + 1 for end-of-string char), that happens to be initialized to { 's', 't', 'r', 'i', 'n', 'g', '\0' }.
All linker tools that work on object files use the sections to determine what to do with the contents. (Indeed, avr-libc copies the contents of .rodata sections to RAM, and only leaves .progmem.data in program memory.)
Carl Dong, there are several cases where you may observe weird behaviour, even reproducible weird behaviour. I'm no longer certain which one is the root cause of your problem, so I'll just list the ones I think are likely:
If linking against avr-libc, running out of RAM
AVRs have very little RAM, and copying even string literals to RAM easily eats it all up. If this happens, any kind of weird behaviour is possible.
Failing to linking against avr-libc
If you think you use avr-libc, but are not certain, then use avr-objdump -d binary.elf | grep -e '^[0-9a-f]* <_' to see if the ELF binary contains any library code. You should expect to see at least <__do_clear_bss>:, <_exit>:, and <__stop_program>: in that list, I believe.
Linking against some other C library, but expecting avr-libc behaviour
Other libraries you link against may have different rules. In particular, if they're designed to work with some other C compiler -- especially one that supports multiple address spaces, and therefore can deduce when to use ld and when lpm based on types --, it might be impossible to use avr-gcc with that library, even if all the tools talk to each other nicely.
Using a custom linker script and a freestanding environment (no C library at all)
Personally, I can live with immutable data (.rodata sections) being in program memory, with myself having to explicitly copy any immutable data to RAM whenever needed. This way I can use a simple microcontroller-specific linker script and GCC in freestanding mode (no C library at all used), and get complete control over the microcontroller. On the other hand, you lose all the nice predefined macros and functions avr-libc and other C libraries provide.
In this case, you need to understand the AVR architecture to have any hope of getting sensible results. You'll need to set up the interrupt vectors and all kinds of other stuff to get even a minimal do-nothing loop to actually run; personally, I read all the assembly code GCC produces (from my own C source) simply to see if it makes sense, and to try to make sure it all gets processed correctly.
Questions?
I faced a similar problem (inline strings were equal to 0xff,0xff,...) and solved it by just changing a line in my Makefile
from :
.out.hex:
$(OBJCOPY) -j .text \
-j .data \
-O $(HEXFORMAT) $< $#
to :
.out.hex:
$(OBJCOPY) -j .text \
-j .data \
-j .rodata \
-O $(HEXFORMAT) $< $#
or seems better :
.out.hex:
$(OBJCOPY) -R .fuse \
-R .lock \
-R .eeprom \
-O $(HEXFORMAT) $< $#
You can see full problem and answer here : https://www.avrfreaks.net/comment/2943846#comment-2943846

Function pointer location not getting passed

I've got some C code I'm targeting for an AVR. The code is being compiled with avr-gcc, basically the gnu compiler with the right backend.
What I'm trying to do is create a callback mechanism in one of my event/interrupt driven libraries, but I seem to be having some trouble keeping the value of the function pointer.
To start, I have a static library. It has a header file (twi_master_driver.h) that looks like this:
#ifndef TWI_MASTER_DRIVER_H_
#define TWI_MASTER_DRIVER_H_
#define TWI_INPUT_QUEUE_SIZE 256
// define callback function pointer signature
typedef void (*twi_slave_callback_t)(uint8_t*, uint16_t);
typedef struct {
uint8_t buffer[TWI_INPUT_QUEUE_SIZE];
volatile uint16_t length; // currently used bytes in the buffer
twi_slave_callback_t slave_callback;
} twi_global_slave_t;
typedef struct {
uint8_t slave_address;
volatile twi_global_slave_t slave;
} twi_global_t;
void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback);
#endif
Now the C file (twi_driver.c):
#include <stdint.h>
#include "twi_master_driver.h"
void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback)
{
twi->slave.length = 0;
twi->slave.slave_callback = slave_callback;
twi->slave_address = slave_address;
// temporary workaround <- why does this work??
twi->slave.slave_callback = twi->slave.slave_callback;
}
void twi_slave_interrupt_handler(twi_global_t *twi)
{
(twi->slave.slave_callback)(twi->slave.buffer, twi->slave.length);
// some other stuff (nothing touches twi->slave.slave_callback)
}
Then I build those two files into a static library (.a) and construct my main program (main.c)
#include
#include
#include
#include
#include "twi_master_driver.h"
// ...define microcontroller safe way for mystdout ...
twi_global_t bus_a;
ISR(TWIC_TWIS_vect, ISR_NOBLOCK)
{
twi_slave_interrupt_handler(&bus_a);
}
void my_callback(uint8_t *buf, uint16_t len)
{
uint8_t i;
fprintf(&mystdout, "C: ");
for(i = 0; i < length; i++)
{
fprintf(&mystdout, "%d,", buf[i]);
}
fprintf(&mystdout, "\n");
}
int main(int argc, char **argv)
{
twi_init(2, &bus_a, &my_callback);
// ...PMIC setup...
// enable interrupts.
sei();
// (code that causes interrupt to fire)
// spin while the rest of the application runs...
while(1){
_delay_ms(1000);
}
return 0;
}
I carefully trigger the events that cause the interrupt to fire and call the appropriate handler. Using some fprintfs I'm able to tell that the location assigned to twi->slave.slave_callback in the twi_init function is different than the one in the twi_slave_interrupt_handler function.
Though the numbers are meaningless, in twi_init the value is 0x13b, and in twi_slave_interrupt_handler when printed the value is 0x100.
By adding the commented workaround line in twi_driver.c:
twi->slave.slave_callback = twi->slave.slave_callback;
The problem goes away, but this is clearly a magic and undesirable solution. What am I doing wrong?
As far as I can tell, I've marked appropriate variables volatile, and I've tried marking other portions volatile and removing the volatile markings. I came up with the workaround when I noticed removing fprintf statements after the assignment in twi_init caused the value to be read differently later on.
The problem seems to be with how I'm passing around the function pointer -- and notably the portion of the program that is accessing the value of the pointer (the function itself?) is technically in a different thread.
Any ideas?
Edits:
resolved typos in code.
links to actual files: http://straymark.com/code/ [test.c|twi_driver.c|twi_driver.h]
fwiw: compiler options: -Wall -Os -fpack-struct -fshort-enums -funsigned-char -funsigned-bitfields -mmcu=atxmega128a1 -DF_CPU=2000000UL
I've tried the same code included directly (rather than via a library) and I've got the same issue.
Edits (round 2):
I removed all the optimizations, without my "workaround" the code works as expected. Adding back -Os causes an error. Why is -Os corrupting my code?
Just a hunch, but what happens if you switch these two lines around:
twi->slave.slave_callback = slave_callback;
twi->slave.length = 0;
Does removing the -fpack-struct gcc flag fix the problem? I wonder if you haven't stumbled upon a bug where writing that length field is overwriting part of the callback value.
It looks to me like with the -Os optimisations on (you could try combinations of the individual optimisations enabled by -Os to see exactly which one is causing it), the compiler isn't emitting the right code to manipulate the uint16_t length field when its not aligned on a 2-byte boundary. This happens when you include a twi_global_slave_t inside a twi_global_t that is packed, because the initial uint8_t member of twi_global_t causes the twi_global_slave_t struct to be placed at an odd address.
If you make that initial field of twi_global_t a uint16_t it will probably fix it (or you could turn off struct packing). Try the latest gcc build and see if it still happens - if it does, you should be able to create a minimal test case that shows the problem, so you can submit a bug report to the gcc project.
This really sounds like a stack/memory corruption issue. If you run avr-size on your elf file, what do you get? Make sure (data + bss) < the RAM you have on the part. These types of issues are very difficult to track down. The fact that removing/moving unrelated code changes the behavior is a big red flag.
Replace "&my_callback" with "my_callback" in function main().
Because different threads access the callback address, try protecting it with a mutex or read-write lock.
If the callback function pointer isn't accessed by a signal handler, then the "volatile" qualifier is unnecessary.

Resources