Optimization of #define vs static const (in avr-gcc) - c

Although I welcome answers on this on a general scope, I'm asking primarily for avr-gcc to make this not too broad.
I have looked at some questions, particularly this and this one. They mostly look at semantic differences, e.g. that a static const can't be used in place of a constant expression. However, although talking about memory allocation in general, they don't speak about optimization.
Let's look at an example:
typedef struct {
GPIO_TypeDef *port;
uint8_t pin;
} gpio_pin_t;
static inline void gpio_pin_set(gpio_pin_t gpio, bool set) {
if(set) GPIO_SetBits (gpio.port, 1 << gpio.pin);
else GPIO_ResetBits(gpio.port, 1 << gpio.pin);
}
//elsewhere including above definitions
static const gpio_pin_t gpio_pin = {GPIOE, 8};
void some_function(bool enable) {
gpio_pin_set(gpio_pin, enable);
}
As you can see, I'm using structs, so the third established way (enum) is not an option here
Can I expect gcc to optimize the gpio.port and gpio.pin accesses in the inline function? If not, why so, and does gcc apply other optimizations when it sees consts?
In its full generality, what do I lose optimization-wise by using static const variables instead of defines, especially looking beyond simple integer constants?

It depends on compiler implementation.
If you never take it's address and the symbol is not exported (for C it is by default, hence you must use static) then the optimizer should kick in and optimize it. For simple types (int, float) you can reasonably expect it to work across the board, but for structs - better check yourself what the compiler does. For a simple struct like your my GCC optimize it, eliminating the struct and passing it's values directly, loading constants to CPU registers. For larger structures it may decide it's not worth it.
You may generate assembly listing of the code to check:
avr-gcc -O<opt> -S somefile.c
or
gcc -O<opt> -S somefile.c
Do not forget about optimization level!
Using #define sucks, because preprocessor is really dumb - it's just replaces code literally before it's compiled. Using const buys you better type safety. Consider this example:
#define BYTE_VALUE 257
static const uint8_t sc_value = 257; // at least will show warning
int my_byte = BYTE_VALUE; // no warning, but obviously it's a bug!

Regarding abstraction:
Why do you drown such a simple thing in a flood of abstraction layers? You can safely assume that every C programmer knows the meaning of
PORT |= (1<<pin);
PORT &= ~(1<<pin);
This is as readable as the code can get, as far as a C programmer is concerned. By hiding this code in abstraction layers, you make your code less readable to C programmers, although it may very well get more readable to non-programmers. But you want the former to read your code, not the latter.
The above is also the fastest possible, in terms of efficiency. It is quite likely that the compiler will translate such code directly to a single bit set/bit clear assembler instruction.
So my advise is to throw all the abstraction out. You don't want to hide the hardware away from the embedded C programmer. What you do need to hide away from them is the specific hardware implementations, so that they need not re-invent the wheel for every new project.
One such abstraction is when writing a hardware-independent API layer. Such as for example void led_set (bool lit);. This function will light a LED on your board. You'd place it in the abstract file led.h, which has no corresponding .c file. Because the .c file is implemented for your specific project only: in my_led.c you'll have the actual implementation, which directly access the GPIO registers, sets up data direction & pull resistor regisers, handles signal polarity and so on.
Regarding your specific question:
There are no guarantees that GCC will inline that function as you expect: the inline keyword is pretty much obsolete, as the compilers nowadays are far smarter than programmers when it comes to deciding when to inline a function. I would say it is very likely, given that you compile with maximum optimization enabled. Only way to find out is to try.
But then it doesn't really matter the slightest whether or not the compiler inlines this function. You will not likely ever have such extreme real-time requirements that the function calling overhead will affect your program. We're talking tens of nano seconds here: not even the digital electronic integrated circuits on your board will respond fast enough for those extra CPU ticks to make a difference.
I work with MCU-based real-time embedded systems on daily basis and even in those systems, you rarely ever face a situation where extreme code optimization like this would matter. If you did, you would be using a DSP, not a regular MCU, most certainly not an AVR.
More importantly, your static const reduced the scope of the constant to the local file, so that none else need to concern itself with it, nor will you clutter down the global namespace. This is good programming practice, and good programming practice wins over manual code optimization in 9 times out of 10.

Related

Functions splitting effect on running time

I am writing a DSP code in C (windows environment). The code should be modified, by another engineer, to run on Cortex-M4. This engineer claims that, for reduction of running time, many of the functions that I have implemented should be united into one function. I prefer to avoid it keeping clarity and testing.
Does his claim make sense? If it is, where I can read about it. Otherwise, can I show that he is wrong without a comparison of running time?
Does his claim make sense?
Depends on context. Modern compilers are perfectly able to inline function calls, but that usually means that those functions must be placed in the same translation unit (essentially the same .c file).
If your functions are in the same .c file then their claim is wrong, if you have the functions scattered across multiple files, then their claim is likely correct.
If it is, where I can read about it.
Function inlining has been around for some 30 years. C even added an inline keyword for it in year 1999 (C++ had one earlier still), though during the 2000s compilers turned smarter than programmers in terms of determining when and what to inline. Nowadays when using modern compilers, inline is mostly considered obsolete.
Otherwise, can I show that he is wrong without a comparison of running time?
By disassembling the optimized code and see if there are any function calls or not. Still, function calls are relatively cheap on Cortex M (unless there's a ton of different parameters), so doing manual optimization to remove them would be very tiny optimization.
As always there's a choice between code size and execution speed.
If you wish to remove the stack overhead of calling a new function but wish to keep your code modular then consider using the inline function attribute suitable for your compiler e.g.
static inline void com_ClearMessageBuffer(uint8_t* pBuffer, uint32_t length)
{
NRF_LOG_DEBUG("com_ClearMessageBuffer");
memset(pBuffer, 0, length);
}
Then at compile time your inline function code will be inserted into the code flow wherever it is called.
This will speed execution, but when called multiple times increase the code size.

Is it possible to generate ansi C functions with type information for a moving GC implementation?

I am wondering what methods there are to add typing information to generated C methods. I'm transpiling a higher-level programming language to C and I'd like to add a moving garbage collector. However to do that I need the method variables to have typing information, otherwise I could modify a primitive value that looks like a pointer.
An obvious approach would be to encapsulate all (primitive and non-primitive) variables in a struct that has an extra (enum) variable for typing information, however this would cause memory and performance overhead, the transpiled code is namely meant for embedded platforms. If I were to accept the memory overhead the obvious option would be to use a heap handle for all objects and then I'd be able to freely move heap blocks. However I'm wondering if there's a more efficient better approach.
I've come up with a potential solution, namely to predeclare and group variables based whether they're primitives or not (I can do that in the transpiler), and add an offset variable to each method at the end (I need to be able to find it accurately when scanning the stack area), that tells me where the non-primitive variables begin and where they end, so I can only scan those. This means that each method will use an additional 16/32-bit (depending on arch) of memory, however this should still be more memory efficient than the heap handle approach.
Example:
void my_func() {
int i = 5;
int z = 3;
bool b = false;
void* person;
void* person_info = ...;
.... // logic
volatile int offset = 0x034;
}
My aim is for something that works universally across GCC compilers, thus my concerns are:
Can the compiler reorder the variables from how they're declared in
the source code?
Can I force the compiler to put some data in the
method's stack frame (using volatile)?
Can I find the offset accurately when scanning the stack?
I'd like to avoid assembly so this approach can work (by default) across multiple platforms, however I'm open for methods even if they involve assembly (if they're reliable).
Typing information could be somehow encoded in the C function name; this is done by C++ and other implementations and called name mangling.
Actually, you could decide, since all your C code is generated, to adopt a different convention: generate long C identifiers which are practically unique and sort-of random program-wide, such as tiziw_7oa7eIzzcxv03TmmZ and keep their typing information elsewhere (e.g. some database). On Linux, such an approach is friendly to both libbacktrace and dlsym(3) + dladdr(3) (and of course nm(1) or readelf(1) or gdb(1)), so used in both bismon and RefPerSys projects.
Typing information is practically tied to calling conventions and ABIs. For example, the x86-64 ABI for Linux mandates different processor registers for passing floating points or pointers.
Read the Garbage Collection handbook or at least P.Wilson Uniprocessor Garbage Collection Techniques survey. You could decide to use tagged integers instead of boxing them, and you could decide to have a conservative GC (e.g. Boehm's GC) instead of a precise one. In my old GCC MELT project I generated C or C++ code for a generational copying GC. Similar techniques are used both in Bismon and in RefPerSys.
Since you are transpiling to C, consider also alternatives, such as libgccjit or LLVM. Look into libjit and asmjit.
Study also the implementation of other transpilers (compilers to C), including Chicken/Scheme and Bigloo.
Can the GCC compiler reorder the variables from how they're declared in the source code?
Of course yes, depending upon the optimizations you are asking. Some variables won't even exist in the binary (e.g. those staying in registers).
Can I force the compiler to put some data in the method's stack frame (using volatile)?
Better generate a single struct variable containing all your language variables, and leave optimizations to the compiler. You will be surprised (see this draft report).
Can I find the offset accurately when scanning the stack?
This is the most difficult, and depends a lot of compiler optimizations (e.g. if you run gcc with -O1 or -O3 on the generated C code; in some cases a recent GCC -e.g GCC 9 or GCC 10 on x86-64 for Linux- is capable of tail-call optimizations; check by compiling using gcc -O3 -S -fverbose-asm then looking into the produced assembler code). If you accept some small target processor and compiler specific tricks, this is doable. Study the implementation of the Ocaml compiler.
Send me (to basile#starynkevitch.net) an email for discussion. Please mention the URL of your question in it.
If you want to have an efficient generational copying GC with multi-threading, things become extremely tricky. The question is then how many years of development can you afford spending.
If you have exceptions in your language, take also a great care. You could with great caution generate calls to longjmp.
See of course this answer of mine.
With transpiling techniques, the evil is in the details
On Linux (specifically!) see also my manydl.c program. It demonstrates that on a Linux x86-64 laptop you could generate, in practice, hundred of thousands of dlopen(3)-ed plugins. Read then How to write shared libraries
Study also the implementation of SBCL and of GNU Prolog, at least for inspiration.
PS. The dream of a totally architecture-neutral and operating-system independent transpiler is an illusion.

How do I "tell" to C compiler that the code shouldn't be optimized out?

Sometimes I need some code to be executed by the CPU exactly as I put it in the source. But any C compiler has it's optimization algorithms so I can expect some tricks. For example:
unsigned char flag=0;
interrupt ADC_ISR(){
ADC_result = ADCH;
flag = 1;
}
void main(){
while(!flag);
echo ADC_result;
}
Some compilers will definitely make while(!flag); loop infinitive as it will suppose flag equals to false (!flag is therefore always true).
Sometimes I can use volatile keyword. And sometimes it can help. But actually in my case (AVR GCC) volatile keyword forces compiler to locate the variable into SRAM instead of registers (which is bad for some reasons). Moreover many articles in the Internet suggesting to use volatile keyword with a big care as the result can become unstable (depending on a compiler, its optimization settings, platform and so on).
So I would definitely prefer to somehow point out the source code instruction and tell to the compiler that this code should be compiled exactly as it is. Like this: volatile while(!flag);
Is there any standard C instruction to do this?
The only standard C way is volatile. If that doesn't happen to do exactly what you want, you'll need to use something specific for your platform.
You should indeed use volatile as answered by David Schwartz. See also this chapter of GCC documentation.
If you use a recent GCC compiler, you could disable optimizations in a single function by using appropriate function specific options pragmas (or some optimize function attribute), for instance
#pragma GCC optimize ("-O0");
before your main. I'm not sure it is a good idea.
Perhaps you want extended asm statements with the volatile keyword.
You have several options:
Compile without optimisations. Unlike some compilers, GCC doesn't optimise by default so unless you tell it to optimise, you should get generated code which looks very similar to your C source. Of course you can choose to optimise some C files and not others, using simple make rules.
Take the compiler out of the equation and write the relevant functions in assembly. Then you can get exactly the generated code you want.
Use volatile, which prevents the compiler from making any assumptions about a certain variable, so for any use of the variable in C the compiler is forced to generate a LOAD or a STORE even if ostensibly unnecessary.

STM32 HAL Library simple C coding error

I am using the STM32 HAL Library for a micro controller project. In the ADC section I found the following code:
uint32_t WaitLoopIndex = 0;
/...
.../
/* Delay for ADC stabilization time. */
/* Delay fixed to worst case: maximum CPU frequency */
while(WaitLoopIndex < ADC_STAB_DELAY_CPU_CYCLES)
{
WaitLoopIndex++;
}
It is my understanding that this code will most likely get optimized away since WaitLoopIndex isn't used anywhere else in the function and is not declared volatile, right?
Technically yes, though from my experiences with compilers for embedded targets, that loop will not get optimised out. If you think about it, having a pointless loop is not really a construct you are going to see in a program unless the programmer does it on purpose, so I doubt many compilers bothers to optimise for it.
The fact that you have to make assumptions about how it might be optimised though means it most certainly is a bug, and one of the worst types at that. More than likely ST wanted to only use C in their library, so did this instead of the inline assembler delay that they should have used. But since the problem they are trying to solve is heavily platform dependent, an annoying platform/compiler dependent solution is unavoidable, and all they have done here is try to hide that dependency.
Declaring the variable volatile will help, but you still really have no idea how long that loop is taking to execute without making assumptions about how the compiler is building it. This is still very bad practice, though if they added an assert reminding you to check the delay manually that might be passable.
This depends on the compiler and the optimization level. To confirm the result, just enter debug mode and check the disassembly code of the piece of code.

When should I use type abstraction in embedded systems

I've worked on a number of different embedded systems. They have all used typedefs (or #defines) for types such as UINT32.
This is a good technique as it drives home the size of the type to the programmer and makes you more conscious of chances for overflow etc.
But on some systems you know that the compiler and processor won't change for the life of the project.
So what should influence your decision to create and enforce project-specific types?
EDIT
I think I managed to lose the gist of my question, and maybe it's really two.
With embedded programming you may need types of specific size for interfaces and also to cope with restricted resources such as RAM. This can't be avoided, but you can choose to use the basic types from the compiler.
For everything else the types have less importance.
You need to be careful not to cause overflow and may need to watch out for register and stack usage. Which may lead you to UINT16, UCHAR.
Using types such as UCHAR can add compiler 'fluff' however. Because registers are typically larger, some compilers may add code to force the result into the type.
i++;
can become
ADD REG,1
AND REG, 0xFF
which is unecessary.
So I think my question should have been :-
given the constraints of embedded software what is the best policy to set for a project which will have many people working on it - not all of whom will be of the same level of experience.
I use type abstraction very rarely. Here are my arguments, sorted in increasing order of subjectivity:
Local variables are different from struct members and arrays in the sense that you want them to fit in a register. On a 32b/64b target, a local int16_t can make code slower compared to a local int since the compiler will have to add operations to /force/ overflow according to the semantics of int16_t. While C99 defines an intfast_t typedef, AFAIK a plain int will fit in a register just as well, and it sure is a shorter name.
Organizations which like these typedefs almost invariably end up with several of them (INT32, int32_t, INT32_T, ad infinitum). Organizations using built-in types are thus better off, in a way, having just one set of names. I wish people used the typedefs from stdint.h or windows.h or anything existing; and when a target doesn't have that .h file, how hard is it to add one?
The typedefs can theoretically aid portability, but I, for one, never gained a thing from them. Is there a useful system you can port from a 32b target to a 16b one? Is there a 16b system that isn't trivial to port to a 32b target? Moreover, if most vars are ints, you'll actually gain something from the 32 bits on the new target, but if they are int16_t, you won't. And the places which are hard to port tend to require manual inspection anyway; before you try a port, you don't know where they are. Now, if someone thinks it's so easy to port things if you have typedefs all over the place - when time comes to port, which happens to few systems, write a script converting all names in the code base. This should work according to the "no manual inspection required" logic, and it postpones the effort to the point in time where it actually gives benefit.
Now if portability may be a theoretical benefit of the typedefs, readability sure goes down the drain. Just look at stdint.h: {int,uint}{max,fast,least}{8,16,32,64}_t. Lots of types. A program has lots of variables; is it really that easy to understand which need to be int_fast16_t and which need to be uint_least32_t? How many times are we silently converting between them, making them entirely pointless? (I particularly like BOOL/Bool/eBool/boolean/bool/int conversions. Every program written by an orderly organization mandating typedefs is littered with that).
Of course in C++ we could make the type system more strict, by wrapping numbers in template class instantiations with overloaded operators and stuff. This means that you'll now get error messages of the form "class Number<int,Least,32> has no operator+ overload for argument of type class Number<unsigned long long,Fast,64>, candidates are..." I don't call this "readability", either. Your chances of implementing these wrapper classes correctly are microscopic, and most of the time you'll wait for the innumerable template instantiations to compile.
The C99 standard has a number of standard sized-integer types. If you can use a compiler that supports C99 (gcc does), you'll find these in <stdint.h> and you can just use them in your projects.
Also, it can be especially important in embedded projects to use types as a sort of "safety net" for things like unit conversions. If you can use C++, I understand that there are some "unit" libraries out there that let you work in physical units that are defined by the C++ type system (via templates) that are compiled as operations on the underlying scalar types. For example, these libraries won't let you add a distance_t to a mass_t because the units don't line up; you'll actually get a compiler error.
Even if you can't work in C++ or another language that lets you write code that way, you can at least use the C type system to help you catch errors like that by eye. (That was actually the original intent of Simonyi's Hungarian notation.) Just because the compiler won't yell at you for adding a meter_t to a gram_t doesn't mean you shouldn't use types like that. Code reviews will be much more productive at discovering unit errors then.
My opinion is if you are depending on a minimum/maximum/specific size don't just assume that (say) an unsigned int is 32 bytes - use uint32_t instead (assuming your compiler supports C99).
I like using stdint.h types for defining system APIs specifically because they explicitly say how large items are. Back in the old days of Palm OS, the system APIs were defined using a bunch of wishy-washy types like "Word" and "SWord" that were inherited from very classic Mac OS. They did a cleanup to instead say Int16 and it made the API easier for newcomers to understand, especially with the weird 16-bit pointer issues on that system. When they were designing Palm OS Cobalt, they changed those names again to match stdint.h's names, making it even more clear and reducing the amount of typedefs they had to manage.
I believe that MISRA standards suggest (require?) the use of typedefs.
From a personal perspective, using typedefs leaves no confusion as to the size (in bits / bytes) of certain types. I have seen lead developers attempt both ways of developing by using standard types e.g. int and using custom types e.g. UINT32.
If the code isn't portable there is little real benefit in using typedefs, however , if like me then you work on both types of software (portable and fixed environment) then it can be useful to keep a standard and use the cutomised types. At the very least like you say, the programmer is then very much aware of how much memory they are using. Another factor to consider is how 'sure' are you that the code will not be ported to another environment? Ive seen processor specific code have to be translated as a hardware engieer has suddenly had to change a board, this is not a nice situation to be in but due to the custom typedefs it could have been a lot worse!
Consistency, convenience and readability. "UINT32" is much more readable and writeable than "unsigned long long", which is the equivalent for some systems.
Also, the compiler and processor may be fixed for the life of a project, but the code from that project may find new life in another project. In this case, having consistent data types is very convenient.
If your embedded systems is somehow a safety critical system (or similar), it's strongly advised (if not required) to use typedefs over plain types.
As TK. has said before, MISRA-C has an (advisory) rule to do so:
Rule 6.3 (advisory): typedefs that indicate size and signedness should be used in place of the basic numerical types.
(from MISRA-C 2004; it's Rule #13 (adv) of MISRA-C 1998)
Same also applies to C++ in this area; eg. JSF C++ coding standards:
AV Rule 209 A UniversalTypes file will be created to define all sta
ndard types for developers to use. The types include: [uint16, int16, uint32_t etc.]
Using <stdint.h> makes your code more portable for unit testing on a pc.
It can bite you pretty hard when you have tests for everything but it still breaks on your target system because an int is suddenly only 16 bit long.
Maybe I'm weird, but I use ub, ui, ul, sb, si, and sl for my integer types. Perhaps the "i" for 16 bits seems a bit dated, but I like the look of ui/si better than uw/sw.

Resources