How to write wrapper function so that compiler efficiently optimize it?

How to write wrapper function so that compiler efficiently optimize it? - c

I have a function which do basic error checking before returning next node of the link list:
Node *next_node(Node *n) {
switch(type(n)){
case A:
case B:
.
.
case N:
return n->next;
default:
exit(1); //This is just a representation of my code handling error scenario.
}
}
There are more wrapper functions on top of it which does certain things depending upon return value of above mentioned function.
These wrapper functions are being used extensively in my whole code base. It is too much that when I run profiler, I found them as time consuming routines.
This do make sense to me because there are thousands of function calls from different code area to these functions and such number of calls will definitely take time due to function call overheading and/or instruction cache misses.
I also know that compilers do some optimization around these, so that they can be inlined while generating assembly code. It seems that it is not working properly because of current way of implementation.
So, my questions are:
What is the common way of writing such wrapper functions so that compiler can optimize them for low runtime?
How do other companies handle such a scenario in their code base?
Note1: code above is just for representation, there are lot of such wrapper function in the whole code base. Therefore, if someone has any idea on improving upon runtime issue due to wrapper functions, should share the idea.
Note2: I'm using gcc as compiler and my code base is entirely in C.

So you seem to be in a correct case for low level optimization because you have profiled your code and found a time consuming function.
Assuming that you already use the higher optimization level of your compiler, the bad part it that there is no general answer to such a question. I can only give you some hints here:
are some redundant operations somewhere? if yes you can try to make sure that you do them only once
are there complex loops that could be unwinded? not much to gain but some tests at the price of a longer code
is assembly code an option? if yes you can try to write all or part of the function in assembly language
But you could (should?) also wonder whether:
the general structure of your application is correct - if you can manage to less call the function you spend less time in it...
you really need the tests in production code. If they are assertions more than tests, they should only be conditionally included with a #ifdef in tests and debug versions.

Related

Functions splitting effect on running time

I am writing a DSP code in C (windows environment). The code should be modified, by another engineer, to run on Cortex-M4. This engineer claims that, for reduction of running time, many of the functions that I have implemented should be united into one function. I prefer to avoid it keeping clarity and testing.
Does his claim make sense? If it is, where I can read about it. Otherwise, can I show that he is wrong without a comparison of running time?

Does his claim make sense?
Depends on context. Modern compilers are perfectly able to inline function calls, but that usually means that those functions must be placed in the same translation unit (essentially the same .c file).
If your functions are in the same .c file then their claim is wrong, if you have the functions scattered across multiple files, then their claim is likely correct.
If it is, where I can read about it.
Function inlining has been around for some 30 years. C even added an inline keyword for it in year 1999 (C++ had one earlier still), though during the 2000s compilers turned smarter than programmers in terms of determining when and what to inline. Nowadays when using modern compilers, inline is mostly considered obsolete.
Otherwise, can I show that he is wrong without a comparison of running time?
By disassembling the optimized code and see if there are any function calls or not. Still, function calls are relatively cheap on Cortex M (unless there's a ton of different parameters), so doing manual optimization to remove them would be very tiny optimization.

As always there's a choice between code size and execution speed.
If you wish to remove the stack overhead of calling a new function but wish to keep your code modular then consider using the inline function attribute suitable for your compiler e.g.
static inline void com_ClearMessageBuffer(uint8_t* pBuffer, uint32_t length)
{
NRF_LOG_DEBUG("com_ClearMessageBuffer");
memset(pBuffer, 0, length);
}
Then at compile time your inline function code will be inserted into the code flow wherever it is called.
This will speed execution, but when called multiple times increase the code size.

Enable mocking for unit testing a library in C

In our environment we're encountering a problem regarding mocking functions for our library unit tests.
The thing is that instead of mocking whole modules (.c files) we would like to mock single functions.
The library is compiled to an archive file and linked statically to the unit test. Without mocking there isn't any issue.
Now when trying to mock single functions of the library we would get multiple definitions obviously.
My approach now is to use the weak function attribute when compiling/linking the library so that the linker takes the mocked (non-weak) function when linking against the unit test. I already tested it and it seems to work as expected.
The downside of this is that we need many attribute declarations in the code.
My final approach would be to pass some compile or link arguments to the compiler, that every function is automatically declared as a weak symbol.
The question now is: Is there anything to do this in a nice way?
btw: We use clang 8 as a compiler.

James Grenning describes several options to solve this problem (http://blog.wingman-sw.com/linker-substitution-in-c-limitations-and-workarounds). The option "function pointer substitution" gives a high degree of freedom. It works as follows: Replace functions by pointers to functions. The function pointers are initialized to point to the original function, but each pointer can be redirected individually to a test double.
This approach allows to have one single test executable where you can still decide for each test case individually for which function you use a test double and for which you use the original function.
It certainly also comes at a price:
One indirection for each call. But, if you use link-time-optimization the optimizer will most likely eliminate that indirection again, so this may not be an issue.
You make it possible to redirect function calls also in production code. This would certainly be a misuse of the concept, however.

I would suggest using VectorCAST
https://www.vector.com/us/en/products/products-a-z/software/vectorcast/
I've used, unity/cmock and others for unit testing C in the past, but after a while its vary tedious to manually create these for a language that isnt really built around that concept and is very much a heres a Hammer and Chissel the world is yours approach.
VectorCAST abstracts majority of the manual work that is required with tools like Unity/Cmock, we can get results across a project/module sooner and quicker than we did in the past with the other tools.
Is vectorCAST expensive and very much an enterprise level tool? yes... but its defiantly worth its weight in gold. And thats coming from someone who is very old school, manual approach to software development... just text editors, terminals and commandline debuggers.
VetorCAST handles function pointers and pointers extremely well, stubbing functions is easy as two clicks away. It saved our team alot of time... allowing us to focus on results and reducing the feedback loop of development.

STM32 HAL Library simple C coding error

I am using the STM32 HAL Library for a micro controller project. In the ADC section I found the following code:
uint32_t WaitLoopIndex = 0;
/...
.../
/* Delay for ADC stabilization time. */
/* Delay fixed to worst case: maximum CPU frequency */
while(WaitLoopIndex < ADC_STAB_DELAY_CPU_CYCLES)
{
WaitLoopIndex++;
}
It is my understanding that this code will most likely get optimized away since WaitLoopIndex isn't used anywhere else in the function and is not declared volatile, right?

Technically yes, though from my experiences with compilers for embedded targets, that loop will not get optimised out. If you think about it, having a pointless loop is not really a construct you are going to see in a program unless the programmer does it on purpose, so I doubt many compilers bothers to optimise for it.
The fact that you have to make assumptions about how it might be optimised though means it most certainly is a bug, and one of the worst types at that. More than likely ST wanted to only use C in their library, so did this instead of the inline assembler delay that they should have used. But since the problem they are trying to solve is heavily platform dependent, an annoying platform/compiler dependent solution is unavoidable, and all they have done here is try to hide that dependency.
Declaring the variable volatile will help, but you still really have no idea how long that loop is taking to execute without making assumptions about how the compiler is building it. This is still very bad practice, though if they added an assert reminding you to check the delay manually that might be passable.

This depends on the compiler and the optimization level. To confirm the result, just enter debug mode and check the disassembly code of the piece of code.

C function call and parameter tracing - test case and mock generation

I have a large code base of quite old C code on an embedded system and unfortunately there are no automated test cases/suites. This makes restructuring and refactoring code a dangerous task.
Manually writing test cases is very time consuming, so I thought that it should be possible to automate at least some part of this process for instance by tracing all the function calls and recording of the input and output values. I could then use these values in the test cases (this would not work for all but at least for some functions). It would probably also be possible to create mock functions based on the gathered data.
Having such test cases would make refactoring a less dangerous activity.
Are there any solutions that already can do this? What would be the easiest way to get this to work if I had to code it myself?
I thought about using ctags to find the function definitions, and wrapping them in a function that records the parameter values. Another possibility would probably be a gcc compiler plugin.

There is a gcc option "-finstrument-functions", which mechanism you can use to define your own callbacks for each funtion's entry/exit.
Google it and you can find many good examples.
[Edit] with this gcc option's call back you can only track the function's entry/exit,not the params. but with some tricks you may also track the params. (walk through the current frame pointer to get the param on the stack).
Here is an article talk about the idea of the implementation:
http://linuxgazette.net/151/melinte.html
Furthermore, depends on your embedded system, on linux you can try something like ltrace to show the params(like the strace way). There are many tools do the function trace work either in userspace or kernelspace on linux, ftrace/ust/ltrace/utrace/strace/systemtap/. Anyway, if you do not add any hard debugging code, it's not possible to display the params in the correct way. If you accept the efforts to add entry/exit debugging infomation, then it's much easier.
Also here is a similar thread talk about this problem.
Tool to trace local function calls in Linux

Find unused functions in a C project by static analysis

I am trying to run static analysis on a C project to identify dead code i.e functions or code lines that are never ever called. I can build this project with Visual Studio .Net for Windows or using gcc for Linux. I have been trying to find some reasonable tool that can do this for me but so far I have not succeeded. I have read related questions on Stack Overflow i.e this and this and I have tried to use -Wunreachable-code with gcc but the output in gcc is not very helpful. It is of the following format
/home/adnan/my_socket.c: In function ‘my_sockNtoH32’:
/home/adnan/my_socket.c:666: warning: will never be executed
but when I look at line 666 in my_socket.c, it's actually inside another function that is being called from function my_sockNtoH32() and will not be executed for this specific instance but will be executed when called from some other functions.
What I need is to find the code which will never be executed. Can someone plz help with this?
PS: I can't convince management to buy a tool for this task, so please stick to free/open source tools.

If GCC isn't cutting it for you, try clang (or more accurately, its static analyzer). It (generally, your mileage may vary of course) has a much better static analysis than GCC (and produces much better output). It's used in Apple's Xcode but it's open-source and can be used seperately.

When GCC says "will never be executed", it means it. You may have a bug that, in fact, does make that dead code. For example, something like:
if (a = 42) {
// some code
} else {
// warning: unreachable code
}
Without seeing the code it's not possible to be specific, of course.
Note that if there is a macro at line 666, it's possible GCC refers to a part of that macro as well.

GCC will help you find dead code within a compilation. I'd be surprised if it can find dead code across multiple compilation units. A file-level declaration of a function or variable in a compilation unit means that some other compilation unit might reference it. So anything declared at the top level of a file, GCC can't eliminate, as it arguably only sees one compilation unit at a time.
The problem gets get harder. Imagine that compilation unit A declares function a, and compilation unit B has a function b that calls a. Is a dead? On the face of it, no. But in fact, it depends; if b is dead, and the only reference to a is in b, then a is dead, too. We get the same problem if b merely takes &a and puts it into an array X. Now to decide if a is dead, we need a points-to analysis across the entire system, to see if that pointer to a is used anywhere.
To get this kind of accurate "dead" information, you need a global view of the entire set of compilation units, and need to compute a points-to analysis, followed by the construction of a call-graph based on that points-to analysis. Function a is dead only if the call graph (as a tree,
with main as the root) doesn't reference it somewhere.
(Some caveats are necessary: whatever the analysis is, as a practical matter it must be conservative, so even a full-points to analysis may not identify a function correctly as dead. You also have to worry about uses of a C artifact from outside the set of C functions, e.g., a call to a from some bit of assembler code).
Threading makes this worse; each thread has some root function which is probably at the top of the call DAG. Since how a thread gets started isn't defined by C compilers, it should be clear that to determine if a multithreaded C application has dead code, somehow the analysis has to be told the thread root functions, or be told how to discover them by looking for thread-initialization primitives.
You aren't getting a lot responses on how to get a correct answer. While it isn't open source, our DMS Software Reengineering Toolkit with its C Front End has all the machinery to do this, including C parsers, control- and dataflow- analysis, local and global points-to analysis, and global call graph construction. DMS is easily customized to include extra information such as external calls from assembler, and/or a list of thread roots or specific source-patterns that are thread initialization calls, and we've actually done that (easily) for some large embedded engine controllers with millions of lines of code. DMS has been applied to systems as large as 26 million lines of code (some 18,000 compilation units) for the purpose of building such calls graphs.
[An interesting aside: in processing individual comilation units, DMS for scaling reasons in effect deletes symbols and related code that aren't used in that compilation unit. Remarkably, this gets rid of about 95% of code by volume when you take into account the iceberg usually hiding in the include file nest. It says C software typically has poorly factored include files. I suspect you all know that already.]
Tools like GCC will remove dead code while compiling. That's helpful, but the dead code is still lying around in your compilation unit source code using up developer's attention (they have to figure out if it is dead, too!). DMS in its program transformation mode can be configured, modulo some preprocessor issues, to actually remove that dead code from the source. On very large software systems, you don't really want to do this by hand.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to write wrapper function so that compiler efficiently optimize it? - c

Related

Functions splitting effect on running time

Enable mocking for unit testing a library in C

STM32 HAL Library simple C coding error

C function call and parameter tracing - test case and mock generation

Find unused functions in a C project by static analysis

Categories

Resources