What methods are there to modularize C code? - c

What methods, practices and conventions do you know of to modularize C code as a project grows in size?

Create header files which contain ONLY what is necessary to use a module. In the corresponding .c file(s), make anything not meant to be visible outside (e.g. helper functions) static. Use prefixes on the names of everything externally visible to help avoid namespace collisions. (If a module spans multiple files, things become harder., as you may need to expose internal things and not be able hide them with "static")
(If I were to try to improve C, one thing I would do is make "static" the default scoping of functions. If you wanted something visible outside, you'd have to mark it with "export" or "global" or something similar.)

OO techniques can be applied to C code, they just require more discipline.
Use opaque handles to operate on objects. One good example of how this is done is the stdio library -- everything is organised around the opaque FILE* handle. Many successful libraries are organised around this principle (e.g. zlib, apr)
Because all members of structs are implicitly public in C, you need a convention + programmer discipline to enforce the useful technique of information hiding. Pick a simple, automatically checkable convention such as "private members end with '_'".
Interfaces can be implemented using arrays of pointers to functions. Certainly this requires more work than in languages like C++ that provide in-language support, but it can nevertheless be done in C.

The High and Low-Level C article contains a lot of good tips. Especially, take a look at the "Classes and objects" section.
Standards and Style for Coding in ANSI C also contains good advice of which you can pick and choose.

Don't define variables in header files; instead, define the variable in the source file and add an extern statement (declaration) in the header. This will tie into #2 and #3.
Use an include guard on every header. This will save so many headaches.
Assuming you've done #1 and #2, include everything you need (but only what you need) for a certain file in that file. Don't depend on the order of how the compiler expands your include directives.

The approach that Pidgin (formerly Gaim) uses is they created a Plugin struct. Each plugin populates a struct with callbacks for initialization and teardown, along with a bunch of other descriptive information. Pretty much everything except the struct is declared as static, so only the Plugin struct is exposed for linking.
Then, to handle loose coupling of the plugin communicating with the rest of the app (since it'd be nice if it did something between setup and teardown), they have a signaling system. Plugins can register callbacks to be called when specific signals (not standard C signals, but a custom extensible kind [identified by string, rather than set codes]) are issued by any part of the app (including another plugin). They can also issue signals themselves.
This seems to work well in practice - different plugins can build upon each other, but the coupling is fairly loose - no direct invocation of functions, everything's through the signaling stystem.

A function should do one thing and do this one thing well.
Lots of little function used by bigger wrapper functions help to structure code from small, easy to understand (and test!) building blocks.
Create small modules with a couple of functions each. Only expose what you must, keep anything else static inside of the module. Link small modules together with their .h interface files.
Provide Getter and Setter functions for access to static file scope variables in your module. That way, the variables are only actually written to in one place. This helps also tracing access to these static variables using a breakpoint in the function and the call stack.
One important rule when designing modular code is: Don't try to optimize unless you have to. Lots of small functions usually yield cleaner, well structured code and the additional function call overhead might be worth it.
I always try to keep variables at their narrowest scope, also within functions. For example, indices of for loops usually can be kept at block scope and don't need to be exposed at the entire function level. C is not as flexible as C++ with the "define it where you use it" but it's workable.

Breaking the code up into libraries of related functions is one way of keeping things organized. To avoid name conflicts you can also use prefixes to allow you to reuse function names, though with good names I've never really found this to be much of a problem. For example, if you wanted to develop your own math routines but still use some from the standard math library, you could prefix yours with some string: xyz_sin(), xyz_cos().
Generally I prefer the one function (or set of closely related functions) per file and one header file per source file convention. Breaking files into directories, where each directory represents a separate library is also a good idea. You'd generally have a system of makefiles or build files that would allow you to build all or part of the entire system following the hierarchy representing the various libraries/programs.

There are directories and files, but no namespaces or encapsulation. You can compile each module to a separate obj file, and link them together (as libraries).

Related

Prevent external calls to functions inside lib file

Is there a reliable way to prevent external code from calling inner functions of a lib that was compiled from C code?
I would like to deliver a static library with an API header file. The library has different modules, consisting of .c and .h files. I would like to prevent the recepients from using functions declared in the inner .h files.
Is this possible?
Thanks!
Is there a reliable way to prevent external code from calling inner functions of a lib ?
No, there cannot be (read about Rice's theorem; detecting statically such non-trivial properties is undecidable). The library or the code might use function pointers. A malicious user could play with function pointers and pointer arithmetic to call some private function (perhaps after having reverse-engineered your code), even if it is static.
On Linux you might play with visibility tricks.
Or you could organize your library as a single translation unit (a bit like sqlite is doing its amalgamation) and have all internal functions be static ...
In general, the library should have naming conventions about its internal functions (e.g. suffix all of them with _). This could be practically helpful (but not against malicious users).
Most importantly, a library should be well documented (with naming conventions being also documented), and a serious user will only use documented functions the way they are documented to be useful.
(so I don't think you should care about internal functions being called; you do need to document which public functions can be called, and how, and when...; a user calling anything else should expect undefined behavior, that is very bad things)
I would like to deliver a static library with an APIheader file, and would like to prevent the recepients from using the structs I define and the inner functions.
I am not sure that (at least on Linux) delivering a static library is wise. I would recommend delivering a shared library, read Drepper's How to Write Shared Libraries.
And you can't prevent the recipient (assuming a malicious, clever, and determined one) to use inner functions and internal struct-s. You should just discourage them in the documentation, and document well your public functions and data types.
I would like to prevent the recepients from using functions declared in the inner .h files. Is this possible?
No, that is impossible.
It looks like you seek a technical solution to a social issue. You need to trust your users (and they need to trust you), so you should document what functions can be used (and you could even add in your documentation some sentence saying that using directly any undocumented function yields undefined behavior). You can't do much more. Perhaps (in particular if you are selling your library as a proprietary software) you need a lawyer to write a good contract.
You might consider writing your own GCC plugin (or GCC MELT extension) to detect such calls. That could take you weeks of work and is not worth the trouble (and will remain imperfect).
I am not able to guess your motivations and use case (is it some life-critical software driving a nuclear reactor, a medical drug injector, an autonomous vehicule, a missile?). Please explain what would happen to you if some (malicious but clever) user would call an internal undocumented function. And what could happen to that user?

Lesser of two evils when using globals via extern

I'm working with some old code that uses many global variables. I'm fully aware of many of the disadvantages of using global variables, so my question is not about whether I should be using global variables or not.
After reviewing much of the code I've noticed two patterns and I'm trying to decide which one is worse and why.
A similarity between the two patterns is that the global variables are exposed using "extern".
The main difference between the two patterns is:
Some globals are extern'ed/exposed in header files, which are in
turn included in many source files with a #include
Other globals are extern'ed/exposed directly in the source file itself
Which of these two would you believe is worse than the other? And Why?
Would you consider them equally bad? And Why?
1) Hide what you can. If they do not need to be visible, don't allow people to use them (by providing their declarations).
2) Use a static if extern is not necessary and… hide what you can.
Which of these two would you believe is worse than the other? And Why?
The first; Because it is unnecessarily visible to other translations. The second can cause linker errors, but it will take insider knowledge to use correctly in another source/translation. The linker issue can then be resolved by making it static (again, if its declaration is visible to one translation).
Would you consider them equally bad? And Why?
No. If you can hide the globals' implementation and restrict their access, you have done your codebase a favor.
I tend to think of C header files as falling into one of three categories: public, private, and protected (not to be confused with the C++ keywords or the same names). Public are for anything that is meant to be accessed by anyone. Private are for everything that is meant only for internal implementation of a module (if split into multiple files); these are never visible outside of the module. Protected are for those items that are not generally expected to be accessed by another module but for some reason or another it needs to be (module coupling can occur here).
To me a symbol (such as a global variable) extern'ed in the C source file instead of a header file is a violation of these "rules" and is interpreted as a code smell.
Hope this helps.

Is this a reasonable hack to inline functions across translation units?

I'm writing performance-sensitive code that really requires me to force certain function calls to be inlined.
For inline functions that are shared between translation units via a header, one would normally have to put the function definition in the header file. I don't want to do that. Some of these functions operate on complex data structures that should not be exposed in the header.
I've gotten around this by simply #including all the .h and .c files once each into a single .c file, so that there is only one translation unit. (That slows down re-compiles, but not by enough to matter.)
This would be "problem solved," but it eliminates getting an error when a function in one C file calls a function in another C file that is supposed to be private, and I want to get an error in that case. So, I have a separate Makefile entry that does a "normal" build, just to check for this case.
In order to force functions declared inline to play nicely in the "normal" build, I actually define a macro, may_inline, which is used where the inline attribute normally would be. It is defined as empty for a normal build and is defined as "inline" for an optimized build.
This seems like an acceptable solution. The only downside I can see is that I can't have private functions in different .c files that have the same prototype, but so far, that hasn't been much of an issue for me.
Another potential solution is to use GCC's Link-Time Optimization, which is supposed to allow inlining across translation units. It's a new feature, though, and I don't trust it to always inline things the way I would want. Furthermore, I can only get it working on trivial problems, not my actual code.
Is this an acceptable hack, or am I doing something incredibly stupid? The fact that I've never seen this done before makes me a bit nervous.
Unity build is an absolutely valid approach and has been widely used in industry since forever (see e.g. this post). Recent versions of Visual Studio even provide builtin support for them.
LTO has a downside of not being portable even across compilers for the same platform.

Overcoming C limitations for large projects

One aspect where C shows its age is the encapsulation of code. Many modern languages has classes, namespaces, packages... a much more convenient to organize code than just a simple "include".
Since C is still the main language for many huge projects. How do you to overcome its limitations?
I suppose that one main factor should be lots of discipline. I would like to know what you do to handle large quantity of C code, which authors or books you can recommend.
Separate the code into functional units.
Build those units of code into individual libraries.
Use hidden symbols within libraries to reduce namespace conflicts.
Think of open source code. There is a huge amount of C code in the Linux kernel, the GNU C library, the X Window system and the Gnome desktop project. Yet, it all works together. This is because most of the code does not see any of the other code. It only communicates by well-defined interfaces. Do the same in any large project.
Some people don't like it but I am an advocate of organzing my structs and associated functions together as if they are a class where the this pointer is passed explicitly. For instance, combined with a consistent naming convention to make the namespace explicit. A header would be something like:
typedef struct foo {
int x;
double y;
} FOO_T
FOO_T * foo_new();
int foo_set_x(FOO_T * self, int arg1);
int foo_do_bar(FOO_T * self, int arg1);
FOO_T * foo_delete(FOO_T * self);
In the implementation, all the "private" functions would be static. The downside of this is that you can't actually enforce that the user not go and muck with the members of the struct. That's just life in c. I find this style though makes for nicely reusable C types.
A good way you can achieve some encapsulation is to declare internal methods or variables of a module as static
As Andres says, static is your friend. But speaking of friends... if you want to be able to separate a library in two files, then some symbols from one file that need to be seen in the other can not be static.
Decide of some naming conventions: all non-static symbols from library foo start with foo_. And make sure they are always followed: it is precisely the symbols for which it seems constraining ("I need to call it foo_max?! But it is just max!") that there will be clashes.
As Zan says, a typical Linux distribution can be seen as a huge project written mostly in C. It works. There are interfaces, and large-subprojects are implemented as separate processes. An implementation in separate processes helps for debugging, for testing, for code reuse, and it provides a second hierarchy in addition to the only one that exists at link level. When your project becomes large enough, it may start to make sense to put some of the functionalities in separate processes. Something already as specialized as a C compiler is typically implemented as three processes: pre-processor, compiler, assembler.
If you can control the project (e.g. in-house or you pay someone else to do it) you can simply set rules and use reviews and tools to enforce them. There is no real need for the language to do this, you can for instance demand that all functions usable outside a module (=a set of files, don't even need to be a separate) must be marked thus. In effect, you would force the developers to think about the interfaces and stick with them.
If you really want to make the point, you could define macros to show this as well, e.g.
#define PUBLIC
#define PRIVATE static
or some such.
So you are right, discipline is the key here. It involves setting the rules AND making sure that they are followed.

Code Ordering in Source Files - Forward Declarations vs "Don't Repeat Yourself"? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
If you code in C and configure your compiler to insist that all functions are declared before they are used (or if you code in C++), then you can end up with one of (at least) two organizations for your source files.
Either:
Headers
Forward declarations of (static) functions in this file
External functions (primary entry points)
Static - non-public - functions
Or:
Headers
Static - non-public - functions
External functions (primary entry points)
I recognize that in C++, the term 'static' is not preferred, but I'm primarily a C programmer and the equivalent concept exists in C++, namely functions in an anonymous namespace within the file.
Question:
Which organization do you use, and why do you prefer it?
For reference, my own code uses the second format so that the static functions are defined before they are used, so that there is no need to both declare them and define them, which saves on having the information about the function interfaces written out twice - which, in turn, reduces (marginally) the overhead when an internal interface needs to change. The downside to that is that the first functions defined in the file are the lowest-level routines - the ones that are called by functions defined later in the file - so rather than having the most important code at the top, it is nearer the bottom of the file. How much does it matter to you?
I assume that all externally accessible functions are declared in headers, and that this form of repetition is necessary - I don't think that should be controversial.
I've always used method #1, the reason being that I like to be able to quickly tell which functions are defined in a particular file and see their signatures all in one place. I don't find the argument of having to change the prototypes along with the function definition particularly convincing since you usually wind up changing all the code that calls the changed functions anyway, changing the function prototypes while you are at it seems relatively trivial.
In C code I use a simple rule:
Every C file with non-static members will have a corresponding header file defining those members.
This has worked really well for me in the past - makes it easy enough to find the definition of a function because it's in the same-named .h file if I need to look it up. It also works well with doxygen (my preferred tool) because all the cruft is kept in the header where I don't spend most of my time - the C file is full of code.
For static members in a file I insist in ordering the declarations in such a way that they are defined by instantiation before use anyway. And, I avoid circular dependency in function calls almost all of the time.
For C++ code I tried the following:
All code defined in the header file. Use #pragma interface/#pragma implementation to inform the compiler of that; kind of the same way templates put all the code in the header.
That's worked really well for me in C++. It means you end up with HUGE header files which can increase compile time in some cases. You also end up with a C++ body file where you simply include the header and compile. You can instantiate your static member variables here. It also became a nightmare because it was far too easy to change your method params and break your code.
I moved to
Header file with doxygen comments (except for templates, where code must be included in the header) and full body file, except for short methods which I know I'd prefer be inlined when used.
Separating out implementation from definition has the distinct plus that it's harder to change your method/function signatures so you're less likely to do it and break things. It also means that I can have huge doxygen blocks in the header file documenting how things work and work in the code relatively interruption free except for useful comments like "declare a variable called i" (tongue in cheek).
Ada forces the convention and the file naming scheme on you. Most dynamic languages like Ruby, Python, etc don't generally care where/if you declare things.
Number 2: because I write many short functions and refactor them freely, it'd be a significant nuisance to maintain forward declarations. If there's an Emacs extension that does that for you with no fuss, I'd be interested, since the top-down organization is a bit more readable. (I prefer top-down in e.g. Python.)
Actually not quite your Number 2, because I generally group related functions together in the .c regardless of whether they're public or private. If I want to see all the public declarations I'll look in the header.
Number 2 for me.
I think using static or other methods to make your module functions and variables private to the module is a good practice.
I prefer to have my api functions at the bottom of the module. Conversely I put the api functions at the top of my classes as classes are generally reusable. Putting the api functions at the top make it easier to find them quickly. Most IDEs, can take you to any function pretty directly.
(Talking about C code)
Number 2 for me because I always forget to update forward decls to reflect static functions changes.
But I think that the best practice should be
headers
forward declarations + comment on function behaviour for each one
exported functions + eventual comments about implementation details when code is not clear enough
static functions + eventual comments about implementation details
How much does it matter to you?
It's not.
It is important that all local function will be marked as static, but for my opinion defining how to group function in the file is too much. There is no strong reasoning for any version and i don't find any strong disadvantage ever.
In general coding convention is very important and we trying to define as much as possible, but in this case my feeling, that this is unjustified overhead.
After reading all posts again it seems like i should simply upvote (which i did) Darius answer, instead writing all of these ...

Resources