I have some C function which, among other things, does a modulo operation. So it looks something like
const int M = 641;
void func( ...parameters..) {
int x;
... some operations ...
x %= M;
... some more operations ...
}
Now, what is crucial for me is that the number M here is a constant. If I would not tell the compiler that M is a constant, then I would get much slower performance.
Currently, I am very happy with my function func( .. ) , and I want would like to extend it, so it can work on different moduli. But again, it is crucial here that these moduli are fixed. So I would like to be able to do something like
const int arrayM[] = {641, 31, 75, 81, 123};
and then have for each index in the array of constants array_M[i] a version of the function func, say func_i, which is a copy of the function func, but where array_M[i] replaces the role of M.
In practice, my array of constants arrayM[] will consist of around 600 explicit prime numbers, which I will choose in a particular way so that x % array_M[i] compiles to a very fast modulus function (for instance Mersenne primes).
My question is: How do I do this in C without making 600 copies of my function func, and changing the variable M in the code each time ?
Finally, I would like to ask the same question again for CUDA code. So if I would have a cuda-kernel, where at some point in the code a modulus M operation is carried out, and I want to have different copies of the same kernel (one for each index of array_M).
You may use a define like:
#define F(i,n) void func_##i() { printf("%d\n",n); }
#include <stdio.h>
F(1,641)
F(2,31)
...
int main() {
func_1();
func_2();
}
It is possible to obtain the same effect from a list of constant but it is much much more tricky. See recursive macro.
Most compilers will do constant propagation. You need to turn up the optimisation level high. The only way to be sure however is to examine the assembly code, or to explicitly write the code out with the constants folded in, which is ugly and hard to maintain. C++ allows you to specify a scalar as a template.
Related
In order to speed up the performance of my program, I'd like to introduce a function for calculating the leftover of a floating point division (where the quotient is natural, obviously).
Therefore I have following simple function:
double mfmod(double x,double y) {
double a;
return ((a=x/y)-(int)a)*y;
}
As I've heard I could speed up even more by putting this function within a #define clause, but the variable a makes this quite difficult. At this moment I'm here:
#define mfmod(x,y) { \
double a; \
return ((a=x/y)-(int)a)*y; \
}
But trying to launch this gives problems, due to the variable.
The problems are the following: a bit further I'm trying to launch this function:
double test = mfmod(f, div);
And this can't be compiled due to the error message type name is not allowed.
(for your information, f and div are doubles)
Does anybody know how to do this? (if it's even possible) (I'm working with Windows, more exactly Visual Studio, not with GCC)
As I've heard I could speed up even more by putting this function within a #define clause
I think you must have misunderstood. Surely the advice was to implement the behavior as a (#defined) macro, instead of as a function. Your macro is syntactically valid, but the code resulting from expanding it is not a suitable replacement for calling your function.
Defining this as a macro is basically a way to manually inline the function body, but it has some drawbacks. In particular, in standard C, a macro that must expand to an expression (i.e. one that can be used as a value) cannot contain a code block, and therefore cannot contain variable declarations. That, in turn, may make it impossible to avoid multiple evaluation of the macro arguments, which is a big problem.
Here's one way you could write your macro:
#define mfmod(x,y) ( ((x)/(y)) - (int) ((x)/(y)) )
This is not a clear a win over the function, however, as its behavior varies with the argument types, it must evaluate both arguments twice (which can produce unexpected and even undefined results in some cases), and it must also perform the division twice.
If you were willing to change the usage mode so that the macro sets a result variable instead of expanding to an expression, then you could get around many of the problems. #BnBDim provided a first cut at this, but it suffers from some of the same type and multiple-evaluation problems as the above. Here's how you could do it to obtain the same result as your function:
#define mfmod(x, y, res) do { \
double _div = (y); \
double _quot = (double) (x) / _div; \
res = (_quot - (int) _quot) * _div; \
} while (0)
Note that it takes care to reference the arguments once each, and also inside parentheses but for res, which must be an lvalue. You would use it much like a void function instead of like a value-returning function:
double test;
mfmod(f, div, test);
That still affords a minor, but unavoidable risk of breakage in the event that one of the actual arguments to the macro collides with one of the variables declared inside the code block it provides. Using variable names prefixed with underscores is intended to minimize that risk.
Overall, I'd be inclined to go with the function instead, and to let the compiler handle the inlining. If you want to encourage the compiler to do so then you could consider declaring the function inline, but very likely it will not need such a hint, and it is not obligated to honor one.
Or better, just use fmod() until and unless you determine that doing so constitutes a bottleneck.
To answer the stated question: yes, you can declare variables in defined macro 'functions' like the one you are working with, in certain situations. Some of the other answers have shown examples of how to do this.
The problem with your current #define is that you are telling it to return something in the middle of your code, and you are not making a macro that expands in the way you probably want. If I use your macro like this:
...
double y = 1.0;
double x = 1.0;
double z = mfmod(x, y);
int other = (int)z - 1;
...
This is going to expand to:
...
double y = 1.0;
double x = 1.0;
double z = {
double a;
return ((a=x/y)-(int)a)*y;
};
int other = (int)z - 1;
...
The function (if it compiled) would never proceed beyond the initialization of z, because it would return in the middle of the macro. You also are trying to 'assign' a block of code to z.
That being said, this is another example of making assumptions about performance without any (stated) benchmarking. Have you actually measured any performance problem with just using an inline function?
__attribute__((const))
extern inline double mfmod(const double x, const double y) {
const double a = x/y;
return (a - (int)a) * y;
}
Not only is this cleaner, clearer, and easier to debug than the macro, it has the added benefit of being declared with the const attribute, which will suggest to the compiler that subsequent calls to the function with the same arguments should return the same value, which can cause repeated calls to the function to be optimized away entirely, whereas the macro would be evaluated every time (conceptually). To be honest, even using the local double to cache the division result is probably a premature optimization, since the compiler will probably optimize this away. If this were me, and I absolutely HAD to have a macro to do this, I would write it as follows:
#define mfmod(x, y) (((x/y)-((int)(x/y)))*y)
There will almost certainly not be any noticeable performance hit under optimization for doing the division twice. If there were, I would use the inline function above. I will leave it to you to do the benchmarking.
You could use this work-around
#define mfmod(x,y,res) \
do { \
double a=(x)/(y); \
res = (a-(int)a)*(y); \
} while(0)
Say I have a program which controls some Christmas lights (this isn't the actual application, only an example). These lights have a few different calculations to determine whether a light, i, will be lit in a given frame, t. Each of i and t is a uint8_t, so it can be assumed that there are 256 lights and t will loop each 256 frames. Some light patterns could be the following:
int flash(uint8_t t, uint8_t i) {
return t&1;}
int alternate(uint8_t t, uint8_t i) {
return i&1 == t&1;}
int loop(uint8_t t, uint8_t i) {
return i == t;}
If I then wanted to implement a mode-changing system that would loop through these modes, I could use a function pointer array int (*modes)(uint8_t, uint8_t)[3]. But, since these are all such short functions, is there any way I could instead force the compiler to place the functions directly after one another in program memory, sort of like an inline array?
The idea would be that to access one of these functions wouldn't require evaluating the pointer, and you could instead tell the processor the correct function is at modes + pitch*mode where pitch is the spacing between functions (at least the length of the longest).
I ask more out of curiosity than requirement, because I doubt this would actually cause much of a speed improvement.
What you are asking for is not directly available in C. But such logic can be possible in assembler, and C compilers might utilize different assembler tricks depending on CPU, optimization level etc. Try to just make the logic small and compact, mark the different functions as static, and use an switch() block in C and look at the assembler the compiler generates.
You could use a switch statement, like:
#define FLASH 1
#define ALTERNATE 2
#define LOOP 3
int patternexecute(uint8_t t, uint8_t i, int pattern)
{
switch (pattern) {
case FLASH: return t&1;
case ALTERNATE: return i&1 == t&1;
case LOOP: return i == t;
}
return 0;
}
Let's say I want to use LAPACK to solve a system of linear equations in C (GCC).
I set up the problem as follows:
/* Want to solve Ax=b */
int n = ...; // size
double *A = ...; // nxn matrix
double *b = ...; // length-n vector
int m = 1; // number of columns in b (needs to be in a variable)
double *pivot; // records pivoting
int info; // return value
Now it seems I can use one of three functions to solve this problem.
The first one is this:
dgesv_( &n, &m, A, &n, pivot, b, &n, &info );
I was surprised to see that this did not require any #includes, which seems... weird.
The second function has almost the same signature, except for the prefix LAPACK_ which I think causes less ambiguity and is possibly less error-prone:
#include <lapack/lapacke.h>
LAPACK_dgesv( &n, &m, A, &n, pivot, b, &n, &info );
Note that this requires me to include lapacke.h.
The third function changes the signature somewhat by returning info and not taking all arguments as pointers:
#include <lapack/lapacke.h>
info = LAPACKE_dgesv( LAPACK_COL_MAJOR, n, m, A, n, pivot, b, n);
Again, this function requires lapacke.h. It also requires linking to an extra library with -llapacke.
All three functions need -llapack.
I'm trying to figure out the differences between these functions.
I did some snooping around and I found the following macros in lapacke.h and related header files:
#define LAPACK_GLOBAL(name,NAME) name##_
#define LAPACK_dgesv LAPACK_GLOBAL(dgesv,DGESV)
So it seems that LAPACK_dgesv() and dgesv_() are different names for the exact same function.
However, it appears LAPACKE_dgesv() is something else with possibly a different implementation, especially considering the fact it needs an extra library.
So my question is: what are the differences between these two functions?
The documentation says LAPACKE is a C interface for LAPACK, but then what about the function dgesv_()?
Clearly I can use it normally without needing LAPACKE and without compiling anything in Fortran, so how is that different?
Thanks.
Update
Curiously, the function dgemm_() (matrix multiplication) does not have any LAPACK_dgemm() equivalent.
What's going on?
Notice that LAPACKE_dgesv() features an additional flag which can be LAPACK_COL_MAJOR (usual fortran order) or LAPACK_ROW_MAJOR (usual c order). In case of LAPACK_COL_MAJOR, it just calls LAPACK_dgesv() directly. In case of LAPACK_ROW_MAJOR, LAPACKE_dgesv() will transpose the matrix before calling LAPACK_dgesv(). It is not a new implementation of dgesv_(). Take a look at lapack-3.5.0/lapacke/src/dgesv_work.c In this file, there are minor additional changes about error handling.
LAPACK_dgesv() is defined in the header lapacke.h as LAPACK_GLOBAL(dgesv,DGESV). The macro LAPACK_GLOBAL is defined in lapacke_mangling.h : it just wraps dgesv_ and cares for naming convention if other conventions are used.
So, basically, the function LAPACK_dgesv() just requires the headers of lapacke. Compared to dgesv_, some problems related to naming conventions in libraries may be avoided. But LAPACK_dgesv() is exactly the same as dgesv_(). The function LAPACKE_dgesv() enlarges the scope of LAPACK_dgesv() to handle usual c matrix.But it still calls dgesv_ in the end.
The function dgemm() is part of the BLAS library. A wrapped c version cblas_dgemm() can be found in CBLAS . Again, an additionnal flag CBLAS_ORDER is required, with possible values CblasRowMajor and CblasColMajor.
#define MAX 7
#define BUFFER 16
#define MODULO 8
typedef struct {
int x;
} BLAH;
if I have:
checkWindow(BLAH *b) {
int mod;
mod = b.MODULO;
}
Specifically can I access MODULO from the BLAH structure?
I think you misunderstand the meaning of preprocessor definitions. #define-d items only look like variables, but they are not variables in the classical sense of the word: they are text substitutions. They are interpreted by the preprocessor, before the compiler gets to see the text of your program. By the time the preprocessor is done, the text of the program has no references to MAX, BUFFER, or MODULO: their occurrences are substituted with 7, 16, and 8. That is why you cannot access #define-d variables: there are no variables to access.
All #defines will be replaced in plain text by the "values" they define, before compilation. They are not variables, just short-hand syntax to make writing programs easy. None of your #def stuff actually reaches the compiler, its resolved in preprocessor.
Now, if you simply replace MODULO in your example by 8, does the resulting code make sense to you?
If it does make sense, please take a Computer Programming 101 course.
Greetings and salutations,
I am looking for information regrading design patterns for working with a large number of functions in C99.
Background:
I am working on a complete G-Code interpreter for my pet project, a desktop CNC mill. Currently, commands are sent over a serial interface to an AVR microcontroller. These commands are then parsed and executed to make the milling head move. a typical example of a line might look like
N01 F5.0 G90 M48 G1 X1 Y2 Z3
where G90, M48, and G1 are "action" codes and F5.0, X1, Y2, Z3 are parameters (N01 is the optional line number and is ignored). Currently the parsing is coming along swimmingly, but now it is time to make the machine actually move.
For each of the G and M codes, a specific action needs to be taken. This ranges from controlled motion to coolant activation/deactivation, to performing canned cycles. To this end, my current design features a function that uses a switch to select the proper function and return a pointer to that function which can then be used to call the individual code's function at the proper time.
Questions:
1) Is there a better way to resolve an arbitrary code to its respective function than a switch statement? Note that this is being implemented on a microcontroller and memory is EXTREMELY tight (2K total). I have considered a lookup table but, unfortunately, the code distribution is sparse leading to a lot of wasted space. There are ~100 distinct codes and sub-codes.
2) How does one go about function pointers in C when the names (and possibly signatures) may change? If the function signatures are different, is this even possible?
3) Assuming the functions have the same signature (which is where I am leaning), is there a way to typedef a generic type of that signature to be passed around and called from?
My apologies for the scattered questioning. Thank you in advance for your assistance.
1) Perfect hashing may be used to map the keywords to token numbers (opcodes) , which can be used to index a table of function pointers. The number of required arguments can also be put in this table.
2) You don's want overloaded / heterogeneous functions. Optional arguments might be possible.
3) your only choice is to use varargs, IMHO
I'm not an expert on embedded systems, but I have experience with VLSI. So sorry if I'm stating the obvious.
The function-pointer approach is probably the best way. But you'll need to either:
Arrange all your action codes to be consecutive in address.
Implement an action code decoder similar to an opcode decoder in a normal processor.
The first option is probably the better way (simple and small memory footprint). But if you can't control your action codes, you'll need to implement a decoder via another lookup table.
I'm not entirely sure on what you mean by "function signature". Function pointers should just be a number - which the compiler resolves.
EDIT:
Either way, I think two lookup tables (1 for function pointers, and one for decoder) is still going to be much smaller than a large switch statement. For varying parameters, use "dummy" parameters to make them all consistent. I'm not sure what the consequences of force casting everything to void-pointers to structs will be on an embedded processor.
EDIT 2:
Actually, a decoder can't be implementated with just a lookup table if the opcode space is too large. My mistake there. So 1 is really the only viable option.
Is there a better way ... than a switch statement?
Make a list of all valid action codes (a constant in program memory, so it doesn't use any of your scarce RAM), and sequentially compare each one with the received code. Perhaps reserve index "0" to mean "unknown action code".
For example:
// Warning: untested code.
typedef int (*ActionFunctionPointer)( int, int, char * );
struct parse_item{
const char action_letter;
const int action_number; // you might be able to get away with a single byte here, if none of your actions are above 255.
// alas, http://reprap.org/wiki/G-code mentions a "M501" code.
const ActionFunctionPointer action_function_pointer;
};
int m0_handler( int speed, int extrude_rate, char * message ){ // M0: Stop
speed_x = 0; speed_y = 0; speed_z = 0; speed_e = 0;
}
int g4_handler ( int dwell_time, int extrude_rate, char * message ){ // G4: Dwell
delay(dwell_time);
}
const struct parse_item parse_table[] = {
{ '\0', 0, unrecognized_action } // special error-handler
{ 'M', 0, m0_handler }, // M0: Stop
// ...
{ 'G', 4, g4_handler }, // G4: Dwell
{ '\0', 0, unrecognized_action } // special error-handler
}
ActionFunctionPointer get_action_function_pointer( char * buffer ){
char letter = get_letter( buffer );
int action_number = get_number( buffer );
int index = 0;
ActionFunctionPointer f = 0;
do{
index++;
if( (letter == parse_table[index].action_letter ) and
(action_number == parse_table[index].action_number) ){
f = parse_table[index].action_function_pointer;
};
if('\0' == parse_table[index].action_letter ){
index = 0;
f = unrecognized_action;
};
}while(0 == f);
return f;
}
How does one go about function pointers in C when the names (and
possibly signatures) may change? If the function signatures are
different, is this even possible?
It's possible to create a function pointer in C that (at different times) points to functions with more or less parameters (different signatures) using varargs.
Alternatively, you can force all the functions that might possibly be pointed to by that function pointer to all have exactly the same parameters and return value (the same signature) by adding "dummy" parameters to the functions that require fewer parameters than the others.
In my experience, the "dummy parameters" approach seems to be easier to understand and use less memory than the varargs approach.
Is there a way to typedef a generic type of that signature
to be passed around and called from?
Yes.
Pretty much all the code I've ever seen that uses function pointers
also creates a typedef to refer to that particular type of function.
(Except, of course, for Obfuscated contest entries).
See the above example and Wikibooks: C programming: pointers to functions for details.
p.s.:
Is there some reason you are re-inventing the wheel?
Could maybe perhaps one of the following pre-existing G-code interpreters for the AVR work for you, perhaps with a little tweaking?
FiveD,
Sprinter,
Marlin,
Teacup Firmware,
sjfw,
Makerbot,
or
Grbl?
(See http://reprap.org/wiki/Comparison_of_RepRap_Firmwares ).