Using C preprocessor macros for function naming idiomatic? - c

I'm writing a Scheme interpreter. For each built-in type (integer, character, string, etc) I want to have the read and print functions named consistently:
READ_ERROR Scheme_read_integer(FILE *in, Value *val);
READ_ERROR Scheme_read_character(FILE *in, Value *val);
I want to ensure consistency in the naming of these functions
#define SCHEME_READ(type_) Scheme_read_##type_
#define DEF_READER(type_, in_strm_, val_) READ_ERROR SCHEME_READ(type_)(FILE *in_strm_, Value *val_)
So that now, instead of the above, in code I can write
DEF_READER(integer, in, val)
{
// Code here ...
}
DEF_READER(character, in, val)
{
// Code here ...
}
and
if (SOME_ERROR != SCHEME_READ(integer)(stdin, my_value)) do_stuff(); // etc.
Now is this considered an unidiomatic use of the preprocessor? Am I shooting myself in the foot somewhere unknowingly? Should I instead just go ahead and use the explicit names of the functions?
If not are there examples in the wild of this sort of thing done well?

I've seen this done extensively in a project, and there's a severe danger of foot-shooting going on.
The problem happens when you try to maintain the code. Even though your macro-ized function definitions are all neat and tidy, under the covers you get function names like Scheme_read_integer. Where this can become an issue is when something like Scheme_read_integer appears on a crash stack. If someone does a search of the source pack for Scheme_read_integer, they won't find it. This can cause great pain and gnashing of teeth ;)
If you're the only developer, and the code base isn't that big, and you remember using this technique years down the road and/or it's well documented, you may not have an issue. In my case it was a very large code base, poorly documented, with none of the original developers around. The result was much tooth-gnashing.
I'd go out on a limb and suggest using a C++ template, but I'm guessing that's not an option since you specifically mentioned C.
Hope this helps.

I'm usually a big fan of macros, but you should probably consider inlined wrapper functions instead. They will add negligible runtime overhead and will appear in stack backtraces, etc., when you're debugging.

Related

Naming convention for allocated/non-allocated strings

Are there any common naming conventions to differentiate between strings that are allocated and not allocated? What I am looking for is hopefully something similar to us/s from
Making Wrong Code Look Wrong, but I'd rather use something common than make up my own.
The us / s convention in the article is somewhat useful, but it misses a greater point. The point is that variable naming conventions are only as good at protecting from missteps as documentation.
For example:
#Version 1
char* sHello = "Hello";
printf("%s\n", sHello);
which gets modified eventually to
#Version 2
char* sHello = "Hello";
... 100 lines of code ...
sHello = realloc(sHello, strlen(sHello)+7);
strcpy(sHello+strlen(sHello), " World");
... 100 lines of code ...
printf("%s\n", sHello);
When you really want to make a coding convention work, you can't rely on some sort of arbitrary (and yes, they are all arbitrary) naming conventions. You have to back up that naming convention with a means of failing the build. In the above case, judicious use of the keyword const would have done the trick, but the coding convention will lull people into complacency that it's right because it's named a certain way.
It's the documentation problem all over again, eventually in-code documentation gets out-of-sync with the code, and your in-code naming convention will eventually get out-of-sync with the usage of those names.
Now, if you want to add typing information to C (which is really what you're getting at), do it via a system that actually adds types to c, typedef. Then put in place a system that can verify if the type is not being used when it is required and fail the build. Anything else will cost you too much after-the-fact searching / maintenance / cleanup, and half of an implemented coding style is just so much worse than even a bad coding style.

Managing without Objects in C - And, why can I declare variables anywhere in a function in C?

everyone. I actually have two questions, somewhat related.
Question #1: Why is gcc letting me declare variables after action statements? I thought the C89 standard did not allow this. (GCC Version: 4.4.3) It even happens when I explicitly use --std=c89 on the compile line. I know that most compilers implement things that are non-standard, i.e. C compilers allowing // comments, when the standard does not specify that. I'd like to learn just the standard, so that if I ever need to use just the standard, I don't snag on things like this.
Question #2: How do you cope without objects in C? I program as a hobby, and I have not yet used a language that does not have Objects (a.k.a. OO concepts?) -- I already know some C++, and I'd like to learn how to use C on it's own. Supposedly, one way is to make a POD struct and make functions similar to StructName_constructor(), StructName_doSomething(), etc. and pass the struct instance to each function - is this the 'proper' way, or am I totally off?
EDIT: Due to some minor confusion, I am defining what my second question is more clearly: I am not asking How do I use Objects in C? I am asking How do you manage without objects in C?, a.k.a. how do you accomplish things without objects, where you'd normally use objects?
In advance, thanks a lot. I've never used a language without OOP! :)
EDIT: As per request, here is an example of the variable declaration issue:
/* includes, or whatever */
int main(int argc, char *argv[]) {
int myInt = 5;
printf("myInt is %d\n", myInt);
int test = 4; /* This does not result in a compile error */
printf("Test is %d\n", test);
return 0;
}
c89 doesn't allow this, but c99 does. Although it's taken a long time to catch on, some compilers (including gcc) are finally starting to implement c99 features.
IMO, if you want to use OOP, you should probably stick to C++ or try out Objective C. Trying to reinvent OOP built on top of C again just doesn't make much sense.
If you insist on doing it anyway, yes, you can pass a pointer to a struct as an imitation of this -- but it's still not a good idea.
It does often make sense to pass (pointers to) structs around when you need to operate on a data structure. I would not, however, advise working very hard at grouping functions together and having them all take a pointer to a struct as their first parameter, just because that's how other languages happen to implement things.
If you happen to have a number of functions that all operate on/with a particular struct, and it really makes sense for them to all receive a pointer to that struct as their first parameter, that's great -- but don't feel obliged to force it just because C++ happens to do things that way.
Edit: As far as how you manage without objects: well, at least when I'm writing C, I tend to operate on individual characters more often. For what it's worth, in C++ I typically end up with a few relatively long lines of code; in C, I tend toward a lot of short lines instead.
There is more separation between the code and data, but to some extent they're still coupled anyway -- a binary tree (for example) still needs code to insert nodes, delete nodes, walk the tree, etc. Likewise, the code for those operations needs to know about the layout of the structure, and the names given to the pointers and such.
Personally, I tend more toward using a common naming convention in my C code, so (for a few examples) the pointers to subtrees in a binary tree are always just named left and right. If I use a linked list (rare) the pointer to the next node is always named next (and if it's doubly-linked, the other is prev). This helps a lot with being able to write code without having to spend a lot of time looking up a structure definition to figure out what name I used for something this time.
#Question #1: I don't know why there is no error, but you are right, variables have to be declared at the beginning of a block. Good thing is you can declare blocks anywhere you like :). E.g:
{
int some_local_var;
}
#Question #2: actually programming C without inheritance is sometimes quite annoying. but there are possibilities to have OOP to some degree. For example, look at the GTK source code and you will find some examples.
You are right, functions like the ones you have shown are common, but the constructor is commonly devided into an allocation function and an initialization function. E.G:
someStruct* someStruct_alloc() { return (someStruct*)malloc(sizeof(someStruct)); }
void someStruct_init(someStruct* this, int arg1, arg2) {...}
In some libraries, I have even seen some sort of polymorphism, where function pointers are stored within the struct (which have to be set in the initializing function, of course). This results in a C++ like API:
someStruct* str = someStruct_alloc();
someStruct_init(str);
str->someFunc(10, 20, 30);
Regarding OOP in C, have you looked at some of the topics on SO? For instance, Can you write object oriented code in C?.
I can't put my finger on an example, but I think they enforce an OO like discipline in Linux kernel programming as well.
In terms of learning how C works, as opposed to OO in C++, you might find it easier to take a short course in some other language that doesn't have an OO derivative -- say, Modula-2 (one of my favorites) or even BASIC (if you can still find a real BASIC implementation -- last time I wrote BASIC code it was with the QBASIC that came with DOS 5.0, later compiled in full Quick BASIC).
The methods you use to get things done in Modula-2 or Pascal (barring the strong typing, which protects against certain types of errors but makes it more complicated to do certain things) are exactly those used in non-OO C, and working in a language with different syntax might (probably will, IMO) make it easier to learn the concepts without your "programming reflexes" kicking in and trying to do OO operations in a nearly-familiar language.

C99 mixed declarations and code in open source projects?

Why is still C99 mixed declarations and code not used in open source C projects like the Linux kernel or GNOME?
I really like mixed declarations and code since it makes the code more readable and prevents hard to see bugs by restricting the scope of the variables to the narrowest possible. This is recommended by Google for C++.
For example, Linux requires at least GCC 3.2 and GCC 3.1 has support for C99 mixed declarations and code
You don't need mixed declaration and code to limit scope. You can do:
{
int c;
c = 1;
{
int d = c + 1;
}
}
in C89. As for why these projects haven't used mixed declarations (assuming this is true), it's most likely a case of "If it ain't broke don't fix it."
This is an old question but I'm going to suggest that inertia is the reason that most of these projects still use ANSI C declarations rules.
However there are a number of other possibilities, ranging from valid to ridiculous:
Portability. Many open source projects work under the assumption that pedantic ANSI C is the most portable way to write software.
Age. Many of these projects predate the C99 spec and the authors may prefer a consistent coding style.
Ignorance. The programmers submitting predate C99 and are unaware of the benefits of mixed declarations and code. (Alternate interpretation: Developers are fully aware of the potential tradeoffs and decide that mixed declarations and statements are not worth the effort. I highly disagree, but it's rare that two programmers will agree on anything.)
FUD. Programmers view mixed declarations and code as a "C++ism" and dislike it for that reason.
There is little reason to rewrite the Linux kernel to make cosmetic changes that offer no performance gains.
If the code base is working, so why change it for cosmetic reasons?
There is no benefit. Declaring all variables at the beginning of the function (pascal like) is much more clear, in C89 you can also declare variables at the beginning of each scope (inside loops example) which is both practical and concise.
I don't remember any interdictions against this in the style guide for kernel code. However, it does say that functions should be as small as possible, and only do one thing. This would explain why a mixture of declarations and code is rare.
In a small function, declaring variables at the start of scope acts as a sort of Introit, telling you something about what's coming soon after. In this case the movement of the variable declaration is so limited that it would likely either have no effect, or serve to hide some information about the functionality by pushing the barker into the crowd, so to speak. There is a reason that the arrival of a king was declared before he entered a room.
OTOH, a function which must mix variables and code to be readable is probably too big. This is one of the signs (along with too-nested blocks, inline comments and other things) that some sections of a function need to be abstracted into separate functions (and declared static, so the optimizer can inline them).
Another reason to keep declarations at the beginning of the functions: should you need to reorder the execution of statements in the code, you may move a variable out of its scope without realizing it, since the scope of a variable declared in the middle of code is not evident in the indentation (unless you use a block to show the scope). This is easily fixed, so it's just an annoyance, but new code often undergoes this kind of transformation, and annoyance can be cumulative.
And another reason: you might be tempted to declare a variable to take the error return code from a function, like so:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
Perfectly reasonable thing to do. But:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
....
int ret = another_func_may_fail();
if (ret) { handle_other_fail(ret); }
Ooops! ret is defined twice. "So? Remove the second declaration." you say. But this makes the code asymmetric, and you end up with more refactoring limitations.
Of course, I mix declarations and code myself; no reason to be dogmatic about it (or else your karma may run over your dogma :-). But you should know what the concomitant problems are.
Maybe it's not needed, maybe the separation is good? I do it in C++, which has this feature as well.
There is no reason to change the code away like this, and C99 is still not widely supported by compilers. It is mostly about portability.

C library naming conventions

Introduction
Hello folks, I recently learned to program in C! (This was a huge step for me, since C++ was the first language, I had contact with and scared me off for nearly 10 years.) Coming from a mostly OO background (Java + C#), this was a very nice paradigm shift.
I love C. It's such a beautiful language. What surprised me the most, is the high grade of modularity and code reusability C supports - of course it's not as high as in a OO-language, but still far beyond my expectations for an imperative language.
Question
How do I prevent naming conflicts between the client code and my C library code? In Java there are packages, in C# there are namespaces. Imagine I write a C library, which offers the operation "add". It is very likely, that the client already uses an operation called like that - what do I do?
I'm especially looking for a client friendly solution. For example, I wouldn't like to prefix all my api operations like "myuniquelibname_add" at all. What are the common solutions to this in the C world? Do you put all api operations in a struct, so the client can choose its own prefix?
I'm very looking forward to the insights I get through your answers!
EDIT (modified question)
Dear Answerers, thank You for Your answers! I now see, that prefixes are the only way to safely avoid naming conflicts. So, I would like to modifiy my question: What possibilities do I have, to let the client choose his own prefix?
The answer Unwind posted, is one way. It doesn't use prefixes in the normal sense, but one has to prefix every api call by "api->". What further solutions are there (like using a #define for example)?
EDIT 2 (status update)
It all boils down to one of two approaches:
Using a struct
Using #define (note: There are many ways, how one can use #define to achieve, what I desire)
I will not accept any answer, because I think that there is no correct answer. The solution one chooses rather depends on the particular case and one's own preferences. I, by myself, will try out all the approaches You mentioned to find out which suits me best in which situation. Feel free to post arguments for or against certain appraoches in the comments of the corresponding answers.
Finally, I would like to especially thank:
Unwind - for his sophisticated answer including a full implementation of the "struct-method"
Christoph - for his good answer and pointing me to Namespaces in C
All others - for Your great input
If someone finds it appropriate to close this question (as no further insights to expect), he/she should feel free to do so - I can not decide this, as I'm no C guru.
I'm no C guru, but from the libraries I have used, it is quite common to use a prefix to separate functions.
For example, SDL will use SDL, OpenGL will use gl, etc...
The struct way that Ken mentions would look something like this:
struct MyCoolApi
{
int (*add)(int x, int y);
};
MyCoolApi * my_cool_api_initialize(void);
Then clients would do:
#include <stdio.h>
#include <stdlib.h>
#include "mycoolapi.h"
int main(void)
{
struct MyCoolApi *api;
if((api = my_cool_api_initialize()) != NULL)
{
int sum = api->add(3, 39);
printf("The cool API considers 3 + 39 to be %d\n", sum);
}
return EXIT_SUCCESS;
}
This still has "namespace-issues"; the struct name (called the "struct tag") needs to be unique, and you can't declare nested structs that are useful by themselves. It works well for collecting functions though, and is a technique you see quite often in C.
UPDATE: Here's how the implementation side could look, this was requested in a comment:
#include "mycoolapi.h"
/* Note: This does **not** pollute the global namespace,
* since the function is static.
*/
static int add(int x, int y)
{
return x + y;
}
struct MyCoolApi * my_cool_api_initialize(void)
{
/* Since we don't need to do anything at initialize,
* just keep a const struct ready and return it.
*/
static const struct MyCoolApi the_api = {
add
};
return &the_api;
}
It's a shame you got scared off by C++, as it has namespaces to deal with precisely this problem. In C, you are pretty much limited to using prefixes - you certainly can't "put api operations in a struct".
Edit: In response to your second question regarding allowing users to specify their own prefix, I would avoid it like the plague. 99.9% of users will be happy with whatever prefix you provide (assuming it isn't too silly) and will be very UNHAPPY at the hoops (macros, structs, whatever) they will have to jump through to satisfy the remaining 0.1%.
As a library user, you can easily define your own shortened namespaces via the preprocessor; the result will look a bit strange, but it works:
#define ns(NAME) my_cool_namespace_ ## NAME
makes it possible to write
ns(foo)(42)
instead of
my_cool_namespace_foo(42)
As a library author, you can provide shortened names as desribed here.
If you follow unwinds's advice and create an API structure, you should make the function pointers compile-time constants to make inlinig possible, ie in your .h file, use the follwoing code:
// canonical name
extern int my_cool_api_add(int x, int y);
// API structure
struct my_cool_api
{
int (*add)(int x, int y);
};
typedef const struct my_cool_api *MyCoolApi;
// define in header to make inlining possible
static MyCoolApi my_cool_api_initialize(void)
{
static const struct my_cool_api the_api = { my_cool_api_add };
return &the_api;
}
Unfortunately, there's no sure way to avoid name clashes in C. Since it lacks namespaces, you're left with prefixing the names of global functions and variables. Most libraries pick some short and "unique" prefix (unique is in quotes for obvious reasons), and hope that no clashes occur.
One thing to note is that most of the code of a library can be statically declared - meaning that it won't clash with similarly named functions in other files. But exported functions indeed have to be carefully prefixed.
Since you are exposing functions with the same name client cannot include your library header files along with other header files which have name collision. In this case you add the following in the header file before the function prototype and this wouldn't effect client usage as well.
#define add myuniquelibname_add
Please note this is a quick fix solution and should be the last option.
For a really huge example of the struct method, take a look at the Linux kernel; 30-odd million lines of C in that style.
Prefixes are only choice on C level.
On some platforms (that support separate namespaces for linkers, like Windows, OS X and some commercial unices, but not Linux and FreeBSD) you can workaround conflicts by stuffing code in a library, and only export the symbols from the library you really need. (and e.g. aliasing in the importlib in case there are conflicts in exported symbols)

When to use function-like macros in C

I was reading some code written in C this evening, and at the top of
the file was the function-like macro HASH:
#define HASH(fp) (((unsigned long)fp)%NHASH)
This left me wondering, why would somebody choose to implement a
function this way using a function-like macro instead of implementing
it as a regular vanilla C function? What are the advantages and
disadvantages of each implementation?
Thanks a bunch!
Macros like that avoid the overhead of a function call.
It might not seem like much. But in your example, the macro turns into 1-2 machine language instructions, depending on your CPU:
Get the value of fp out of memory and put it in a register
Take the value in the register, do a modulus (%) calculation by a fixed value, and leave that in the same register
whereas the function equivalent would be a lot more machine language instructions, generally something like
Stick the value of fp on the stack
Call the function, which also puts the next (return) address on the stack
Maybe build a stack frame inside the function, depending on the CPU architecture and ABI convention
Get the value of fp off the stack and put it in a register
Take the value in the register, do a modulus (%) calculation by a fixed value, and leave that in the same register
Maybe take the value from the register and put it back on the stack, depending on CPU and ABI
If a stack frame was built, unwind it
Pop the return address off the stack and resume executing instructions there
A lot more code, eh? If you're doing something like rendering every one of the tens of thousands of pixels in a window in a GUI, things run an awful lot faster if you use the macro.
Personally, I prefer using C++ inline as being more readable and less error-prone, but inlines are also really more of a hint to the compiler which it doesn't have to take. Preprocessor macros are a sledge hammer the compiler can't argue with.
One important advantage of macro-based implementation is that it is not tied to any concrete parameter type. A function-like macro in C acts, in many respects, as a template function in C++ (templates in C++ were born as "more civilized" macros, BTW). In this particular case the argument of the macro has no concrete type. It might be absolutely anything that is convertible to type unsigned long. For example, if the user so pleases (and if they are willing to accept the implementation-defined consequences), they can pass pointer types to this macro.
Anyway, I have to admit that this macro is not the best example of type-independent flexibility of macros, but in general that flexibility comes handy quite often. Again, when certain functionality is implemented by a function, it is restricted to specific parameter types. In many cases in order to apply similar operation to different types it is necessary to provide several functions with different types of parameters (and different names, since this is C), while the same can be done by just one function-like macro. For example, macro
#define ABS(x) ((x) >= 0 ? (x) : -(x))
works with all arithmetic types, while function-based implementation has to provide quite a few of them (I'm implying the standard abs, labs, llabs and fabs). (And yes, I'm aware of the traditionally mentioned dangers of such macro.)
Macros are not perfect, but the popular maxim about "function-like macros being no longer necessary because of inline functions" is just plain nonsense. In order to fully replace function-like macros C is going to need function templates (as in C++) or at least function overloading (as in C++ again). Without that function-like macros are and will remain extremely useful mainstream tool in C.
On one hand, macros are bad because they're done by the preprocessor, which doesn't understand anything about the language and does text-replace. They usually have plenty of limitations. I can't see one above, but usually macros are ugly solutions.
On the other hand, they are at times even faster than a static inline method. I was heavily optimizing a short program and found that calling a static inline method takes about twice as much time (just overhead, not actual function body) as compared with a macro.
The most common (and most often wrong) reason people give for using macros (in "plain old C") is the efficiency argument. Using them for efficiency is fine if you have actually profiled your code and are optimizing a true bottleneck (or are writing a library function that might be a bottleneck for somebody someday). But most people who insist on using them have Not actually analyzed anything and are just creating confusion where it adds no benefit.
Macros can also be used for some handy search-and-replace type substitutions which the regular C language is not capable of.
Some problems I have had in maintaining code written by macro abusers is that the macros can look quite like functions but do not show up in the symbol table, so it can be very annoying trying to trace them back to their origins in sprawling codesets (where is this thing defined?!). Writing macros in ALL CAPS is obviously helpful to future readers.
If they are more than fairly simple substitutions, they can also create some confusion if you have to step-trace through them with a debugger.
Your example is not really a function at all,
#define HASH(fp) (((unsigned long)fp)%NHASH)
// this is a cast ^^^^^^^^^^^^^^^
// this is your value 'fp' ^^
// this is a MOD operation ^^^^^^
I'd think, this was just a way of writing more readable code with the casting and mod opration wrapped into a single macro 'HASH(fp)'
Now, if you decide to write a function for this, it would probably look like,
int hashThis(int fp)
{
return ((fp)%NHASH);
}
Quite an overkill for a function as it,
introduces a call point
introduces call-stack setup and restore
The C Preprocessor can be used to create inline functions. In your example, the code will appear to call the function HASH, but instead is just inline code.
The benefits of doing macro functions were eliminated when C++ introduced inline functions. Many older API like MFC and ATL still use macro functions to do preprocessor tricks, but it just leaves the code convoluted and harder to read.

Resources