Generating C code for functions of different signatures, but same implementation - c

A situation I run into a lot in writing C code (context is scientific computation) is that I will have functions which have exactly the same body modulo minor type differences. I realize C++ offers the template feature and function overloading which allows one to have only one copy of said function and let the compiler figure out what signature you meant to use when you build.
While this is a great feature in C++, my project is in C and I furthermore do not need the full power of templating. So far what I have tried is m4 macros on a candidate source file, and this spits out respective .c files with appropriate name mangling for the different types I need. The preprocessor could therefore accomplish this as well, but I'm attempting to avoid using it in complicated ways (my code needs to be understandable for reproducibility reasons). I'm not very good with m4, so all the files have been hacks that only work in specific cases and are inapplicable in new situations.
What do other people programming in C do when this is necessary? Manually produce and maintain the different permutations of function signatures? I'm hoping that isn't the best answer, or that a tool exists to automate this dreary and error prone task.
Apologies for vagueness, let me give a toy example. Suppose I have need to add two numbers. The function might look something like this:
float add(float x,float y){
return x+y;
}
Ok that's great for floats, but what if I need it for a wide range of types on which arithmetic is available. Ok I can do this
float add_f(float x,float y){...}
double add_lf(double x,double y){...}
unsigned int add_ui(unsigned int x, unsigned int y){...}
and so forth. If for some (probably stupid) reason I decide I need to also write the contents of the arguments to a binary file, I now have to add in the requisite file I/O code in every single function. Is there a simple way/tool to take an add function and spit out different ones with name mangling to avoid this annoying situation?
Basically in my m4 cases I would just find/replace a macro TYPE with the requisite type, and have a macro MANGLE() which mangles the functions, then I point the output to an alternate .c file. My m4 skills are lacking though.
Function pointers can help with the ultimate interface of my code, but eventually those pointers have to point to something, and then we're just enumerating all the possibilities again. I'm also unclear on how this might affect potential inlining of short functions.

The only thing i can think of is: make the algorithm itself independent of the type, have the user of your function create his own function to handle the type-specific parts, and make one of the parameters to your function a pointer to the "handler function".
See the definition/implementation of the qsort routine for what i mean. Qsort works for all kinds of data, but handles the data itself transparently - the only things you pass to qsort is the size of each entry, and a function pointer to a function that does the real comparison.

You appear to be asking for generic type support. While the macro processing can work in restricted domains, what you are doing is complex.
If the variants are so similar that simply type and name mangling is enough, then could you not use regular C #defines before each of multiple inclusions of the same source fragment to allow the preprocessor perform the substitution? This way, at least there is only a single environment to manage.
Alternately, if the performance hit is not substantial, could you prepare multiple stub functions for each specialisation and map these to a generic version that can be called from the stubs?

I use GNU autogen for code generation tasks, which sounds somewhat like your current m4 solution, but might be better organized. For example:
type.def
autogen definitions type;
type = { name="int"; mangle="i"; };
type = { name="double"; mangle="lf"; };
type = { name="float"; mangle="f"; };
type = { name="unsigned int"; mangle="ui"; };
type.tpl
[+ autogen5 template
c=%s.c
(setenv "SHELL" "/bin/sh") +]/*
[+ (dne "* " "* ") +]
*/
[+
FOR type "\n" +][+name+] add_[+mangle+]([+name+] x, [+name+] y) { ... }[+ENDFOR+]
or something like that. This should spit out a function for each of the types in type.def looking something like:
unsigned int add_ui(unsigned int x, unsigned int y) { ... }
You can also have it insert type-specific code in certain places if needed, etc. You could have it output the add functions described above as well as the I/O versions. You'd have to compute the text for mangle instead of what I've got, but that's not a problem. You'd also have some conditional code for the I/O and a way to toggle the condition on and off (again, not a problem).
I'd definitely try and see if there was some way to generalize the algorithm, but this approach might have drawbacks (e.g. performance issues from not having the real underlying type) as well. But it sounds from the comments that this approach might not work for you.

I know that most C developers are afraid of it, but have you thought about using macros?
specific to your example:
// floatstuff.h
float add_f(float x,float y);
double add_lf(double x,double y);
unsigned int add_ui(unsigned int x, unsigned int y);
combined with:
// floatstuff.c
#define MY_CODE \
return x + y
float
add (float x, float y)
{
MY_CODE;
}
double
add_lf (double x, double y)
{
MY_CODE;
}
unsigned int
add_ui (unsigned int x, unsigned int y)
{
MY_CODE;
}
If the code you are using per function is truly identical, then this might be the solution you are looking for. It avoids most of the code duplication, maintains some degree of readability and has no impact on your runtime.
Also, if you keep the macro local to your .c file, you are unlikely to break anything, so no worries there either.
Also, you can do even more weird stuff using parameterized macros, which can give you even more reduced code duplication.

Related

Same set of instructions, different types. How to handle?

In a C program I'm making, I will receive as command lines arguments a file path and a letter. The file is where I read data from, and the letter represents the type of data that is held inside that file.
The instructions I need to perform on the data are basically the same, only the type is different: it might be that the file holds ints, doubles or the values of a struct X. Regardless of type, the operations will be identical; how can I avoid repeating code? In C++ I would handle this with templates. How would this be typically handled in C?
In C you would do it through what you're hoping to avoid -- repeating the code. C++ makes this more convenient with templates, as you're aware, however that's just a simple way to repeat the code and base it on a different type.
Something that might be appropriate for you is to provide the different class functions but to not call them directly. Instead, based on your command line, determine once which function(s) will process your data, and assign them to function pointers. Then, your control loop will just generically call the processing function(s) using those pointer(s). This will obviously include whatever you do with the data, but you might also decide to have separate input functions based on data type.
Edit: As Mat says, there are come types which promote well and so one block of code would work fine. I suspect this is why your assignment includes working with some structure type.
The solution to this problem is obvious with modern objected oriented languages -- you make an object of each type that implements an interface (or via inheritance) of the actions you want to perform.
You can't do this in C because the language does not naively support object oriented, but you can "reproduce" the same functionality instead of letting the compiler do it for you. To do so you need to use a level of indirection specifically you will need to use function pointers.
So (as an example) one of the actions you might take is to read values from the file. One of your variables will be a function pointer to a function that takes as a parameter the file and a variable of type void (this will change for each function you write.) Write the function for each of your types and then at run type assign the function to use based on the type of the file.
In the realms of really ugly pre-processor tricks, if you want to replicate the body of a function for different types, but keep the code "structure" identical, you can do something like this:
foo.hc
#define YNAME(X) foo_ ## X
#define XNAME(X) YNAME(X)
#define NAME XNAME(TYPE)
int NAME(FILE* f) {
TYPE myvar;
...
return whatever;
}
foo.c
#define TYPE int
#include "foo.hc"
#undef TYPE
#define TYPE double
#include "foo.hc"
#undef TYPE
This foo.c will pre-process to:
int foo_int(FILE* f) {
int myvar;
...
return whatever;
}
int foo_double(FILE* f) {
double myvar;
...
return whatever;
}
All you need to do in your main processing loop with that is to dispatch to the right function depending on your file type. A plain switch statement can work pretty well, an array of function pointers could work too.
The new C standard, C11, has type generic expressions that you could use for this. There is not yet much compiler support for C11 but for example the latest version of clang has _Generic. You can also use P99 to emulate C11 features on top of similar extensions that are provided by gcc.

Why aren't typedefs strongly typed?

What's the reason for typedefs not being strongly typed? Is there any benefit I can't see or is it due to backward compatibility? See this example:
typedef int Velocity;
void foo(Velocity v) {
//do anything;
}
int main() {
int i=4;
foo(i); //Should result in compile error if strongly typed.
return 0;
}
I am not asking for workarounds to get a strong typed datatype but only want to know why the standard isn't requiring typedefs to be strongly typed?
Thank you.
Because C is not strongly typed and typedef has its origin in that thinking
typedef is just for convenience and readability, it doesn't create a new type.
typedef is just a missnomer (like many other keywords). Think of it as typealias.
C has in the contrary a whole idea of what compatible types are. This allows for example to link compilation units together, even if declarations of function protopyes are only done with compatible types and not with identical ones. All this comes from simple practical necessity in every day life, being still able to give some guarantees to implementations.
Even if Velocity were a distinct type from int, your code would compile and work just fine due to type conversion rules. What would not work is passing an expression of type Velocity * to a function expecting int *, etc. If you want to achieve the latter form of type enforcement, simply make Velocity a structure or union type containing a single integer, and you'll now have a new real type.

C 'generics' -- double and float

I have a function in C that accepts and returns a double (and uses several doubles internally). Is there a good way to make a second version of the function, just like it except with float in place of double? Also constants like DBL_EPSILON should be updated.
I suppose I could do this with the preprocessor, but that seems awkward (and probably difficult to debug if there's a compile error). What do best practices recommend? I can't imagine I'm the only one who's had to deal with this.
Edit: I forgot, this is stackoverflow so I can't just ask a question, I have to justify myself. I have code which is very sensitive to precision in this case; the cost of using doubles rather than floats is 200% to 300%. Up until now I only needed a double version -- when I needed it I wanted as much precision as possible, regardless of the time needed (in that application it was a tiny percentage). But now I've found a use that is sensitive to speed and doesn't benefit from the extra precision. I cringed at my first thought, which was to copy the entire function and replace the types. Then I thought that a better approach would be known to the experts at SO so I posted here.
don't know about "best practices", but the preprocessor definitely was the first thing to jump to my mind. it's similar to templates in C++.
[edit: and the Jesus Ramos answer mentions the different letters on functions with different types in libraries, and indeed you would probably want to do this]
you create a separate source file with your functions, everywhere you have a double change it to FLOATING_POINT_TYPE (just as an example) and then include your source file twice from another file. (or whatever method you choose you just need to be able to ultimately process the file twice, once with each data type as your define.) [also to determine the character appended to distinguish different versions of the function, define FLOATING_POINT_TYPE_CHAR]
#define FLOATING_POINT_TYPE double
#define FLOATING_POINT_TYPE_CHAR d
#include "my_fp_file.c"
#undef FLOATING_POINT_TYPE_CHAR
#undef FLOATING_POINT_TYPE
#define FLOATING_POINT_TYPE float
#define FLOATING_POINT_TYPE_CHAR f
#include "my_fp_file.c"
#undef FLOATING_POINT_TYPE
#undef FLOATING_POINT_TYPE_CHAR
then you can also use a similar strategy for your prototypes in your headers.
but, so in your header file you would need something something like:
#define MY_FP_FUNC(funcname, typechar) \
funcname##typechar
and for your function definitions/prototypes:
FLOATING_POINT_TYPE
MY_FP_FUNC(DoTheMath, FLOATING_POINT_TYPE_CHAR)
(
FLOATING_POINT_TYPE Value1,
FLOATING_POINT_TYPE Value2
);
and so forth.
i'll definitely leave it to someone else to talk about best practices :)
BTW for an example of this kind of strategy in a mature piece of software you can check out FFTW (fftw.org), although it's a bit more complicated than the example i think it uses basically the same strategy.
Don't bother.
Except for a few specific hardware implementations, there is no advantage to having a float version of a double function. Most IEEE 754 hardware performs all calculations in 64- or 80-bit arithmetic internally, and truncates the results to the desired precision on storing.
It is completely fine to return a double to be used or stored as a float. Creating a float version of the same logic is not likely to run any faster or be more suitable for much of anything. The only exception coming to mind would be GPU-optimized algorithms which do not support 64+ bit operations.
As you can see from most standard librarys and such methods aren't really overridden just new methods are created. For example:
void my_function(double d1, double d2);
void my_functionf(float f1, float f2);
A lot of them have different last letters in the method to indicate that it is sort of like a method override for different types. This also applies for return types such as the function atoi, atol, atof.... etc.
Alternatively wrap your function in a macro that adds the type as an argument such as
#define myfunction(arg1, arg2, type) ....
This way it's much easier as you can now just wrap everything with your type avoiding copy pasting the function and you can always check type.
In this case I would say the best practice would be writing a custom codegen tool, which will take 'generic' code and create new version of double and float each time before compilation.

C library naming conventions

Introduction
Hello folks, I recently learned to program in C! (This was a huge step for me, since C++ was the first language, I had contact with and scared me off for nearly 10 years.) Coming from a mostly OO background (Java + C#), this was a very nice paradigm shift.
I love C. It's such a beautiful language. What surprised me the most, is the high grade of modularity and code reusability C supports - of course it's not as high as in a OO-language, but still far beyond my expectations for an imperative language.
Question
How do I prevent naming conflicts between the client code and my C library code? In Java there are packages, in C# there are namespaces. Imagine I write a C library, which offers the operation "add". It is very likely, that the client already uses an operation called like that - what do I do?
I'm especially looking for a client friendly solution. For example, I wouldn't like to prefix all my api operations like "myuniquelibname_add" at all. What are the common solutions to this in the C world? Do you put all api operations in a struct, so the client can choose its own prefix?
I'm very looking forward to the insights I get through your answers!
EDIT (modified question)
Dear Answerers, thank You for Your answers! I now see, that prefixes are the only way to safely avoid naming conflicts. So, I would like to modifiy my question: What possibilities do I have, to let the client choose his own prefix?
The answer Unwind posted, is one way. It doesn't use prefixes in the normal sense, but one has to prefix every api call by "api->". What further solutions are there (like using a #define for example)?
EDIT 2 (status update)
It all boils down to one of two approaches:
Using a struct
Using #define (note: There are many ways, how one can use #define to achieve, what I desire)
I will not accept any answer, because I think that there is no correct answer. The solution one chooses rather depends on the particular case and one's own preferences. I, by myself, will try out all the approaches You mentioned to find out which suits me best in which situation. Feel free to post arguments for or against certain appraoches in the comments of the corresponding answers.
Finally, I would like to especially thank:
Unwind - for his sophisticated answer including a full implementation of the "struct-method"
Christoph - for his good answer and pointing me to Namespaces in C
All others - for Your great input
If someone finds it appropriate to close this question (as no further insights to expect), he/she should feel free to do so - I can not decide this, as I'm no C guru.
I'm no C guru, but from the libraries I have used, it is quite common to use a prefix to separate functions.
For example, SDL will use SDL, OpenGL will use gl, etc...
The struct way that Ken mentions would look something like this:
struct MyCoolApi
{
int (*add)(int x, int y);
};
MyCoolApi * my_cool_api_initialize(void);
Then clients would do:
#include <stdio.h>
#include <stdlib.h>
#include "mycoolapi.h"
int main(void)
{
struct MyCoolApi *api;
if((api = my_cool_api_initialize()) != NULL)
{
int sum = api->add(3, 39);
printf("The cool API considers 3 + 39 to be %d\n", sum);
}
return EXIT_SUCCESS;
}
This still has "namespace-issues"; the struct name (called the "struct tag") needs to be unique, and you can't declare nested structs that are useful by themselves. It works well for collecting functions though, and is a technique you see quite often in C.
UPDATE: Here's how the implementation side could look, this was requested in a comment:
#include "mycoolapi.h"
/* Note: This does **not** pollute the global namespace,
* since the function is static.
*/
static int add(int x, int y)
{
return x + y;
}
struct MyCoolApi * my_cool_api_initialize(void)
{
/* Since we don't need to do anything at initialize,
* just keep a const struct ready and return it.
*/
static const struct MyCoolApi the_api = {
add
};
return &the_api;
}
It's a shame you got scared off by C++, as it has namespaces to deal with precisely this problem. In C, you are pretty much limited to using prefixes - you certainly can't "put api operations in a struct".
Edit: In response to your second question regarding allowing users to specify their own prefix, I would avoid it like the plague. 99.9% of users will be happy with whatever prefix you provide (assuming it isn't too silly) and will be very UNHAPPY at the hoops (macros, structs, whatever) they will have to jump through to satisfy the remaining 0.1%.
As a library user, you can easily define your own shortened namespaces via the preprocessor; the result will look a bit strange, but it works:
#define ns(NAME) my_cool_namespace_ ## NAME
makes it possible to write
ns(foo)(42)
instead of
my_cool_namespace_foo(42)
As a library author, you can provide shortened names as desribed here.
If you follow unwinds's advice and create an API structure, you should make the function pointers compile-time constants to make inlinig possible, ie in your .h file, use the follwoing code:
// canonical name
extern int my_cool_api_add(int x, int y);
// API structure
struct my_cool_api
{
int (*add)(int x, int y);
};
typedef const struct my_cool_api *MyCoolApi;
// define in header to make inlining possible
static MyCoolApi my_cool_api_initialize(void)
{
static const struct my_cool_api the_api = { my_cool_api_add };
return &the_api;
}
Unfortunately, there's no sure way to avoid name clashes in C. Since it lacks namespaces, you're left with prefixing the names of global functions and variables. Most libraries pick some short and "unique" prefix (unique is in quotes for obvious reasons), and hope that no clashes occur.
One thing to note is that most of the code of a library can be statically declared - meaning that it won't clash with similarly named functions in other files. But exported functions indeed have to be carefully prefixed.
Since you are exposing functions with the same name client cannot include your library header files along with other header files which have name collision. In this case you add the following in the header file before the function prototype and this wouldn't effect client usage as well.
#define add myuniquelibname_add
Please note this is a quick fix solution and should be the last option.
For a really huge example of the struct method, take a look at the Linux kernel; 30-odd million lines of C in that style.
Prefixes are only choice on C level.
On some platforms (that support separate namespaces for linkers, like Windows, OS X and some commercial unices, but not Linux and FreeBSD) you can workaround conflicts by stuffing code in a library, and only export the symbols from the library you really need. (and e.g. aliasing in the importlib in case there are conflicts in exported symbols)

Automatically deleting unused local variables from C source code

I want to delete unused local variables from C file.
Example:
int fun(int a , int b)
{
int c,sum=0;
sum=a + b;
return sum;
}
Here the unused variable is 'c'.
I will externally have a list of all unused local variables. Now using unused local variables which I have, we have to find local variables from source code & delete.
In above Example "c" is unused variable. I will be knowing it (I have code for that).
Here I have to find c & delete it .
EDIT
The point is not to find unused local variables with an external tool. The point is to remove them from code given a list of them.
Turn up your compiler warning level, and it should tell you.
Putting your source fragment in "f.c":
% gcc -c -Wall f.c
f.c: In function 'fun':
f.c:1: warning: unused variable 'c'
Tricky - you will have to parse C code for this. How close does the result have to be?
Example of what I mean:
int a, /* foo */
b, /* << the unused one */
c; /* bar */
Now, it's obvious to humans that the second comment has to go.
Slight variation:
void test(/* in */ int a, /* unused */ int b, /* out */ int* c);
Again, the second comment has to go, the one before b this time.
In general, you want to parse your input, filter it, and emit everything that's not the declaration of an unused variable. Your parser would have to preserve comments and #include statements, but if you don't #include headers it may be impossible to recognize declarations (even more so if macro's are used to hide the declaration). After all, you need headers to decide if A * B(); is a function declaration (when A is a type) or a multiplication (when A is a variable)
[edit] Furthermore:
Even if you know that a variable is unused, the proper way to remove it depends a lot on remote context. For instance, assume
int foo(int a, int b, int c) { return a + b; }
Clearly, c is unused. Can you change it to ?
int foo(int a, int b) { return a + b; }
Perhaps, but not if &foo is stored int a int(*)(int,int,int). And that may happen somewhere else. If (and only if) that happens, you should change it to
int foo(int a, int b, int /*unused*/ ) { return a + b; }
Why do you want to do this? Assuming you have a decent optimizing compiler (GCC, Visual Studio et al) the binary output will not be any different wheter you remove the 'int c' in your original example or not.
If this is just about code cleanup, any recent IDE will give you quick links to the source code for each warning, just click and delete :)
My answer is more of an elaborate comment to MSalters' very thorough answer.
I would go beyond 'tricky' and say that such a tool is both impossible and inadvisable.
If you are looking to simply remove the references to the variable, then you could write a code parser of your own, but it would need to distinguish between the function context it is in such as
int foo(double a, double b)
{
b = 10.0;
return (int) b;
}
int bar(double a, double b)
{
a = 5.00;
return (int) a;
}
Any simple parser would have trouble with both 'a' and 'b' being unused variables.
Secondly, if you consider comments as MSalter has, you'll discover that people do not comment consistently;
double a;
/*a is designed as a dummy variable*/
double b;
/*a is designed as a dummy variable*/
double a;
double b;
double a; /*a is designed as a dummy variable*/
double b;
etc.
So simply removing the unused variables will create orphaned comments, which are arguably more dangerous than not commenting at all.
Ultimately, it is an obscenely difficult task to do elegantly, and you would be mangling code regardless. By automating the process, you would be making the code worse.
Lastly, you should be considering why the variables were in the code in the first place, and if they are deprecated, why they were not deleted when all their references were.
Static code analysis tools in additional to warning level as Paul correctly stated.
As well as being able to reveal these through warnings, the compiler will normally optimise these away if any optimisations are turned on. Checking if a variable is never referenced is quite trivial in terms of implementation in the compiler.
You will need a good parser that preserves original character position of tokens (even in presence of preprocessor!). There are some tools for automated refactoring of C/C++, but they are far from mainstream.
I recommend you to check out Taras' Blog. The guy is doing some large automated refactorings of Mozilla codebase, like replacing out-params with return values. His main tool for code rewriting is Pork:
Pork is a C++ parsing and rewriting
tool chain. The core of Pork is a C++
parser that provides exact character
positions for the start and end of
every AST node, as well as the set of
macro expansions that contain any
location. This information allows C++
to be automatically rewritten in a
precise way.
From the blog:
So far pork has been used for “minor”
things like renaming
classes&functions, rotating
outparameters and correcting prbool
bugs. Additionally, Pork proved itself
in an experiment which involved
rewriting almost every function (ie
generating a 3+MB patch) in Mozilla to
use garbage collection instead of
reference-counting.
It is for C++, but it may suit your needs.
One of the posters above says "impossible and inadvisable".
Another says "tricky", which is the right answer.
You need 1) a full C (or whatever language of interest) parser,
2) inference procedures that understand the language
identifier references and data flows to determine that a variable
is indeed "dead", and 3) the ability to actually modify
the source code.
What's hard about all this is the huge energy to build
1) 2) 3). You can't justify for any individual cleanup task.
What one can do is to build such infrastructure specifically
with the goal of amortizing it across lots of differnt
program analysis and transformation tasks.
My company offers such a tool: The DMS Software Reengineering
Toolkit. See
http://www.semdesigns.com/Products/DMS/DMSToolkit.html
DMS has production quality front ends for many languages,
including C, C++, Java and COBOL.
We have in fact built an automated "find useless declarations"
tool for Java that does two things:
a) lists them all (thus producing the list!)
b) makes a copy of the code with the useless declarations
removed.
You choose which answer you want to keep :-)
To do the same for C would not be difficult. We already
have a tool that identifies such dead variables/functions.
One case we did not addess, is the "useless parameter"
case, becasue to remove a useless parameter, you have
to find all the calls from other modules,
verify that setting up the argument doesn't have a side
effect, and rip out the useless argument.
We in fact have full graphs of the entire software
system of interest, and so this would also be
possible.
So, its just tricky, and not even very tricky
if you have the right infrastructure.
You can solve the problem as a text processing problem. There must be a small number of regexp patterns how unused local variables are defined in the source code.
Using a list of unused variable names and the line numbers where they are, You can process the C source code line-by-line. On each line You can iterate over the variable names. On each variable name You can match the patterns one-by-one. After a successful match You know the syntax of the definition, so You know how to delete the unused variable from it.
For example if the source line is: "int a, unused, b;" and the compiler reported "unused" as an unused variable in that line, than the pattern "/, unused,/" will match and You can replace that substring with a single ",".
Also: splint.
Splint is a tool for statically checking C programs for security vulnerabilities and coding mistakes. With minimal effort, Splint can be used as a better lint. If additional effort is invested adding annotations to programs, Splint can perform stronger checking than can be done by any standard lint.

Resources