Overcoming C limitations for large projects - c

One aspect where C shows its age is the encapsulation of code. Many modern languages has classes, namespaces, packages... a much more convenient to organize code than just a simple "include".
Since C is still the main language for many huge projects. How do you to overcome its limitations?
I suppose that one main factor should be lots of discipline. I would like to know what you do to handle large quantity of C code, which authors or books you can recommend.

Separate the code into functional units.
Build those units of code into individual libraries.
Use hidden symbols within libraries to reduce namespace conflicts.
Think of open source code. There is a huge amount of C code in the Linux kernel, the GNU C library, the X Window system and the Gnome desktop project. Yet, it all works together. This is because most of the code does not see any of the other code. It only communicates by well-defined interfaces. Do the same in any large project.

Some people don't like it but I am an advocate of organzing my structs and associated functions together as if they are a class where the this pointer is passed explicitly. For instance, combined with a consistent naming convention to make the namespace explicit. A header would be something like:
typedef struct foo {
int x;
double y;
} FOO_T
FOO_T * foo_new();
int foo_set_x(FOO_T * self, int arg1);
int foo_do_bar(FOO_T * self, int arg1);
FOO_T * foo_delete(FOO_T * self);
In the implementation, all the "private" functions would be static. The downside of this is that you can't actually enforce that the user not go and muck with the members of the struct. That's just life in c. I find this style though makes for nicely reusable C types.

A good way you can achieve some encapsulation is to declare internal methods or variables of a module as static

As Andres says, static is your friend. But speaking of friends... if you want to be able to separate a library in two files, then some symbols from one file that need to be seen in the other can not be static.
Decide of some naming conventions: all non-static symbols from library foo start with foo_. And make sure they are always followed: it is precisely the symbols for which it seems constraining ("I need to call it foo_max?! But it is just max!") that there will be clashes.
As Zan says, a typical Linux distribution can be seen as a huge project written mostly in C. It works. There are interfaces, and large-subprojects are implemented as separate processes. An implementation in separate processes helps for debugging, for testing, for code reuse, and it provides a second hierarchy in addition to the only one that exists at link level. When your project becomes large enough, it may start to make sense to put some of the functionalities in separate processes. Something already as specialized as a C compiler is typically implemented as three processes: pre-processor, compiler, assembler.

If you can control the project (e.g. in-house or you pay someone else to do it) you can simply set rules and use reviews and tools to enforce them. There is no real need for the language to do this, you can for instance demand that all functions usable outside a module (=a set of files, don't even need to be a separate) must be marked thus. In effect, you would force the developers to think about the interfaces and stick with them.
If you really want to make the point, you could define macros to show this as well, e.g.
#define PUBLIC
#define PRIVATE static
or some such.
So you are right, discipline is the key here. It involves setting the rules AND making sure that they are followed.

Related

Why do some C projects like nginx and LuaJIT prefix all their code files, functions and data types with the project's name?

Looking at nginx's code I see that pretty much everything is prefixed with ngx_.
Files:
ngx_list.c
ngx_list.h
ngx_log.c
ngx_log.h
Code:
ngx_log_t *ngx_log_init(u_char *prefix);
void ngx_cdecl ngx_log_abort(ngx_err_t err, const char *fmt, ...);
void ngx_cdecl ngx_log_stderr(ngx_err_t err, const char *fmt, ...);
LuaJIT is pretty much the same thing but with lj_.
Files:
lj_alloc.c
lj_alloc.h
lj_api.c
lj_arch.h
Code:
LJ_ASMF void LJ_FASTCALL lj_vm_ffi_call(CCallState *cc);
LJ_FUNC CTypeID lj_ccall_ctid_vararg(CTState *cts, cTValue *o);
LJ_FUNC int lj_ccall_func(lua_State *L, GCcdata *cd);
Other project's do the same thing, these are just two that come to mind. Why do they do this? If it was the project's public API I would get it as it will be exposed to third party code. But the code I copied is part of the (private) implementation so why namespace it?
I suspect there's no hard-and-fast reason. I suspect it's just something that people (some people) feel more comfortable with. I wouldn't do it myself, but I guess I can see the appeal.
For example, I have a multiprecision arithmetic library I wrote for fun one day. It has functions like mp_add(), mt_sub(), etc. And the source for these functions lives in files add.c and sub.c.
Now, since all the source code for this library lives in a subdirectory named mp, I have never been tempted to give the files names like mp_add.c or mp_sub.c. That would just be redundant: the names are already in a very real sense mp/add.c, mp/sub.c, etc.
But I have to admit that it does feel a teensy bit weird going to a file named add.c to check on my multiprecision addition code. It's not integer addition code, or fixed-point or rational-number addition code, or general-purpose addition code. It's very specifically multiprecision addition code, and the functions defined within do all have that mp_ prefix. So shouldn't the filename have that prefix, also?
As I said, no, in the end I wouldn't (I didn't) give it that prefix. But as I also said, I guess I can see the appeal.
Addendum: Above I answered about filenames, but you also asked about internal -- "private" -- function names. And those are different; those definitely need a prefix, at least in C.
The issue is that C does not really have any namespace mechanisms. So you almost always have to fake it with project-specific prefixes on all global symbols.
Consider the function ngx_log_abort(). It's private to nginx; client code wouldn't be calling it. But it's a global function, so if it were just named log_abort, there would be a pretty high chance of a collision with a completely different function in the the client code (or in some other library code) also named log_abort.
You may ask, then why is ngx_log_abort a global function? And of course the answer is that any of the functions making up the nginx library might need to call it, so it pretty much has to be global.
You may ask, then why isn't ngx_log_abort a file-scope static function? And the answer there is, that would work if all the source code for the entire nginx library were confined to a single C source file nginx.c. But the authors probably didn't want to confine themselves that way.
If you want to write a well-encapsulated library in C, you have two choices for your "private" functions:
Make them file-scope static, and limit yourself to using a single source file for most or all of your library.
Make them truly global, but with uniqueifying prefixes. Also don't put declarations for them in public header files. (That way clients can't call them without cheating.)
In other languages you have other mechanisms for hiding private symbols, but not in C.

Preferred method to use two names to call the same function in C

I know there are at least three popular methods to call the same function with multiple names. I haven't actually heard of someone using the fourth method for this purpose.
1). Could use #defines:
int my_function (int);
#define my_func my_function
OR
#define my_func(int (a)) my_function(int (a))
2). Embedded function calls are another possibility:
int my_func(int a) {
return my_function(a);
}
3). Use a weak alias in the linker:
int my_func(int a) __attribute__((weak, alias("my_function")));
4). Function pointers:
int (* const my_func)(int) = my_function;
The reason I need multiple names is for a mathematical library that has multiple implementations of the same method.
For example, I need an efficient method to calculate the square root of a scalar floating point number. So I could just use math.h's sqrt(). This is not very efficient. So I write one or two other methods, such as one using Newton's Method. The problem is each technique is better on certain processors (in my case microcontrollers). So I want the compilation process to choose the best method.
I think this means it would be best to use either the macros or the weak alias since those techniques could easily be grouped in a few #ifdef statements in the header files. This simplifies maintenance (relatively). It is also possible to do using the function pointers, but it would have to be in the source file with extern declarations of the general functions in the header file.
Which do you think is the better method?
Edit:
From the proposed solutions, there appears to be two important questions that I did not address.
Q. Are the users working primarily in C/C++?
A. All known development will be in C/C++ or assembly. I am designing this library for my own personal use, mostly for work on bare metal projects. There will be either no or minimal operating system features. There is a remote possibility of using this in full blown operating systems, which would require consideration of language bindings. Since this is for personal growth, it would be advantageous to learn library development on popular embedded operating systems.
Q. Are the users going to need/want an exposed library?
A. So far, yes. Since it is just me, I want to make direct modifications for each processor I use after testing. This is where the test suite would be useful. So an exposed library would help somewhat. Additionally, each "optimal implementation" for particular function may have a failing conditions. At this point, it has to be decided who fixes the problem: the user or the library designer. A user would need an exposed library to work around failing conditions. I am both the "user" and "library designer". It would almost be better to allow for both. Then non-realtime applications could let the library solve all of stability problems as they come up, but real-time applications would be empowered to consider algorithm speed/space vs. algorithm stability.
Another alternative would be to move the functionality into a separately compiled library optimised for each different architecture and then just link to this library during compilation. This would allow the project code to remain unchanged.
Depending on the intended audience for your library, I suggest you chose between 2 alternatives:
If the consumer of your library is guaranteed to be Cish, use #define sqrt newton_sqrt for optimal readability
If some consumers of your library are not of the C variety (think bindings to Dephi, .NET, whatever) try to avoid consumer-visible #defines. This is a major PITA for bindings, as macros are not visible on the binary - embedded function calls are the most binding-friendly.
What you can do is this. In header file (.h):
int function(void);
In the source file (.c):
static int function_implementation_a(void);
static int function_implementation_b(void);
static int function_implementation_c(void);
#if ARCH == ARCH_A
int function(void)
{
return function_implementation_a();
}
#elif ARCH == ARCH_B
int function(void)
{
return function_implementation_b();
}
#else
int function(void)
{
return function_implementation_c();
}
#endif // ARCH
Static functions called once are often inlined by the implementation. This is the case for example with gcc by default : -finline-functions-called-once is enabled even in -O0. The static functions that are not called are also usually not included in the final binary.
Note that I don't put the #if and #else in a single function body because I find the code more readable when #if directives are outside the functions body.
Note this way works better with embedded code where libraries are usually distributed in their source form.
I usually like to solve this with a single declaration in a header file with a different source file for each architecture/processor-type. Then I just have the build system (usually GNU make) choose the right source file.
I usually split the source tree into separate directories for common code and for target-specific code. For instance, my current project has a toplevel directory Project1 and underneath it are include, common, arm, and host directories. For arm and host, the Makefile looks for source in the proper directory based on the target.
I think this makes it easier to navigate the code since I don't have to look up weak symbols or preprocessor definitions to see what functions are actually getting called. It also avoids the ugliness of function wrappers and the potential performance hit of function pointers.
You might you create a test suite for all algorithms and run it on the target to determine which are the best performing, then have the test suite automatically generate the necessary linker aliases (method 3).
Beyond that a simple #define (method 1) probably the simplest, and will not and any potential overhead. It does however expose to the library user that there might be multiple implementations, which may be undesirable.
Personally, since only one implementation of each function is likley to be optimal on any specific target, I'd use the test suite to determine the required versions for each target and build a separate library for each target with only those one version of each function the correct function name directly.

What methods are there to modularize C code?

What methods, practices and conventions do you know of to modularize C code as a project grows in size?
Create header files which contain ONLY what is necessary to use a module. In the corresponding .c file(s), make anything not meant to be visible outside (e.g. helper functions) static. Use prefixes on the names of everything externally visible to help avoid namespace collisions. (If a module spans multiple files, things become harder., as you may need to expose internal things and not be able hide them with "static")
(If I were to try to improve C, one thing I would do is make "static" the default scoping of functions. If you wanted something visible outside, you'd have to mark it with "export" or "global" or something similar.)
OO techniques can be applied to C code, they just require more discipline.
Use opaque handles to operate on objects. One good example of how this is done is the stdio library -- everything is organised around the opaque FILE* handle. Many successful libraries are organised around this principle (e.g. zlib, apr)
Because all members of structs are implicitly public in C, you need a convention + programmer discipline to enforce the useful technique of information hiding. Pick a simple, automatically checkable convention such as "private members end with '_'".
Interfaces can be implemented using arrays of pointers to functions. Certainly this requires more work than in languages like C++ that provide in-language support, but it can nevertheless be done in C.
The High and Low-Level C article contains a lot of good tips. Especially, take a look at the "Classes and objects" section.
Standards and Style for Coding in ANSI C also contains good advice of which you can pick and choose.
Don't define variables in header files; instead, define the variable in the source file and add an extern statement (declaration) in the header. This will tie into #2 and #3.
Use an include guard on every header. This will save so many headaches.
Assuming you've done #1 and #2, include everything you need (but only what you need) for a certain file in that file. Don't depend on the order of how the compiler expands your include directives.
The approach that Pidgin (formerly Gaim) uses is they created a Plugin struct. Each plugin populates a struct with callbacks for initialization and teardown, along with a bunch of other descriptive information. Pretty much everything except the struct is declared as static, so only the Plugin struct is exposed for linking.
Then, to handle loose coupling of the plugin communicating with the rest of the app (since it'd be nice if it did something between setup and teardown), they have a signaling system. Plugins can register callbacks to be called when specific signals (not standard C signals, but a custom extensible kind [identified by string, rather than set codes]) are issued by any part of the app (including another plugin). They can also issue signals themselves.
This seems to work well in practice - different plugins can build upon each other, but the coupling is fairly loose - no direct invocation of functions, everything's through the signaling stystem.
A function should do one thing and do this one thing well.
Lots of little function used by bigger wrapper functions help to structure code from small, easy to understand (and test!) building blocks.
Create small modules with a couple of functions each. Only expose what you must, keep anything else static inside of the module. Link small modules together with their .h interface files.
Provide Getter and Setter functions for access to static file scope variables in your module. That way, the variables are only actually written to in one place. This helps also tracing access to these static variables using a breakpoint in the function and the call stack.
One important rule when designing modular code is: Don't try to optimize unless you have to. Lots of small functions usually yield cleaner, well structured code and the additional function call overhead might be worth it.
I always try to keep variables at their narrowest scope, also within functions. For example, indices of for loops usually can be kept at block scope and don't need to be exposed at the entire function level. C is not as flexible as C++ with the "define it where you use it" but it's workable.
Breaking the code up into libraries of related functions is one way of keeping things organized. To avoid name conflicts you can also use prefixes to allow you to reuse function names, though with good names I've never really found this to be much of a problem. For example, if you wanted to develop your own math routines but still use some from the standard math library, you could prefix yours with some string: xyz_sin(), xyz_cos().
Generally I prefer the one function (or set of closely related functions) per file and one header file per source file convention. Breaking files into directories, where each directory represents a separate library is also a good idea. You'd generally have a system of makefiles or build files that would allow you to build all or part of the entire system following the hierarchy representing the various libraries/programs.
There are directories and files, but no namespaces or encapsulation. You can compile each module to a separate obj file, and link them together (as libraries).

Reflection support in C

I know it is not supported, but I am wondering if there are any tricks around it. Any tips?
Reflection in general is a means for a program to analyze the structure of some code.
This analysis is used to change the effective behavior of the code.
Reflection as analysis is generally very weak; usually it can only provide access to function and field names. This weakness comes from the language implementers essentially not wanting to make the full source code available at runtime, along with the appropriate analysis routines to extract what one wants from the source code.
Another approach is tackle program analysis head on, by using a strong program analysis tool, e.g., one that can parse the source text exactly the way the compiler does it.
(Often people propose to abuse the compiler itself to do this, but that usually doesn't work; the compiler machinery wants to be a compiler and it is darn hard to bend it to other purposes).
What is needed is a tool that:
Parses language source text
Builds abstract syntax trees representing every detail of the program.
(It is helpful if the ASTs retain comments and other details of the source
code layout such as column numbers, literal radix values, etc.)
Builds symbol tables showing the scope and meaning of every identifier
Can extract control flows from functions
Can extact data flow from the code
Can construct a call graph for the system
Can determine what each pointer points-to
Enables the construction of custom analyzers using the above facts
Can transform the code according to such custom analyses
(usually by revising the ASTs that represent the parsed code)
Can regenerate source text (including layout and comments) from
the revised ASTs.
Using such machinery, one implements analysis at whatever level of detail is needed, and then transforms the code to achieve the effect that runtime reflection would accomplish.
There are several major benefits:
The detail level or amount of analysis is a matter of ambition (e.g., it isn't
limited by what runtime reflection can only do)
There isn't any runtime overhead to achieve the reflected change in behavior
The machinery involved can be general and applied across many languages, rather
than be limited to what a specific language implementation provides.
This is compatible with the C/C++ idea that you don't pay for what you don't use.
If you don't need reflection, you don't need this machinery. And your language
doesn't need to have the intellectual baggage of weak reflection built in.
See our DMS Software Reengineering Toolkit for a system that can do all of the above for C, Java, and COBOL, and most of it for C++.
[EDIT August 2017: Now handles C11 and C++2017]
Tips and tricks always exists. Take a look at Metaresc library https://github.com/alexanderchuranov/Metaresc
It provides interface for types declaration that will also generate meta-data for the type. Based on meta-data you can easily serialize/deserialize objects of any complexity. Out of the box you can serialize/deserialize XML, JSON, XDR, Lisp-like notation, C-init notation.
Here is a simple example:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "metaresc.h"
TYPEDEF_STRUCT (point_t,
double x,
double y
);
int main (int argc, char * argv[])
{
point_t point = {
.x = M_PI,
.y = M_E,
};
char * str = MR_SAVE_XML (point_t, &point);
if (str)
{
printf ("%s\n", str);
free (str);
}
return (EXIT_SUCCESS);
}
This program will output
$ ./point
<?xml version="1.0"?>
<point>
<x>3.1415926535897931</x>
<y>2.7182818284590451</y>
</point>
Library works fine for latest gcc and clang on Linux, MacOs, FreeBSD and Windows.
Custom macro language is one of the options. User could do declaration as usual and generate types descriptors from DWARF debug info. This moves complexity to the build process, but makes adoption much easier.
any tricks around it? Any tips?
The compiler will probably optionally generate 'debug symbol file', which a debugger can use to help debug the code. The linker may also generate a 'map file'.
A trick/tip might be to generate and then read these files.
Based on the responses to How can I add reflection to a C++ application? (Stack Overflow) and the fact that C++ is considered a "superset" of C, I would say you're out of luck.
There's also a nice long answer about why C++ doesn't have reflection (Stack Overflow).
I needed reflection in a bunch of structs in a C++ project.
I created a xml file with the description of all those structs - fortunately the fields types were primitive types.
I used a template (not C++ template) to auto generate a class for each struct along with setter/getter methods.
In each class I used a map to associate string names and class members (pointers to members).
I didn't regret using reflection because it opened new ways to design my core functionality that I couldn't even imagine without reflection.
(BTW, it was an external report generator for a program that uses a raw database)
So, I used code generation, function pointers and maps to simulate reflection.
I know of the following options, but all come at cost and a lot of limitations:
Use libdl (#include <dfcln.h>)
Call a tool like objdump or nm
Parse the object files yourself (using a corresponding library)
Involve a parser and generate the necessary information at compile time.
"Abuse" the linker to generate symbol arrays.
I'll use a bit of unit test frameworks as examples further down, because automatic test discovery for unit test frameworks is a typical example where reflection comes in very handy, and it's something that most unit test frameworks for C fall short of.
Using libdl (#include <dfcln.h>) (POSIX)
If you're on a POSIX environment, a little bit of reflection can be done using libdl. Plugins are developed that way.
Use
#include <dfcln.h>
in your source code and link with -ldl.
Then you have access to functions dlopen(), dlerror(), dlsym() and dlclose() with which you could load and access / run shared objects at runtime. However, it does not give you easy access to the symbol table.
Another disadvantage of this approach is that you basically restrict reflection to objects loaded as dynamic library (shared object loaded at runtime via dlopen()).
Running nm or objdump
You could run nm or objdump to show the symbol table and parse the output.
For me, nm -P --defined-only -g xyz.o gives good results, and parsing the output is trivial.
You'd be interested in the first word of each line only, which is the symbol name, and maybe the second one, which is the section type.
If you do not know the object name in some static way, i.e. the object is actually a shared object, at least on Linux you then might want to skip symbol names starting with '_'.
objdump, nm or similar tools are also often available outside POSIX environments.
Parsing the object files yourself
You could parse the object files yourself. You probably don't want to implement that from scratch but use an existing library for that. This is how nm, objdump and even libdl are implemented. You could peek at the source code of nm, objdump and libdl and the libraries they use in order to find out how they do what they do.
Involving a Parser
You could write a parser and code generator which generates the necessary reflective information at compile time and stores it in the object file. Then you have a lot of freedom and could even implement primitive forms of annotations. That's what some unit test frameworks like AceUnit do.
I found that writing a parser which covers straight-forward C syntax is fairly trivial. Writing a parser which really understands C and could deal with all cases is NOT trivial.
So, this has limitations which depend on how exotic the C syntax is that you want to reflect upon.
"Abusing" the linker to generate symbol arrays
You could put references to symbols which you want to reflect upon in a special section and use a linker configuration to emit the section boundaries so you can access them in C.
I've described here N-Dependency injection in C - better way than linker-defined arrays? how this works.
But beware, this is depending on a lot of things and not very portable. I have only tried this with GCC/ld, and I know it doesn't work with all compilers / linkers. Also, it's almost guaranteed that dead code elimination will not detect how you call this stuff, so if you use dead code elimination, you will have to add all the reflected symbols as entry points.
Pitfalls
For some of the mechanisms, dead code elimination can be a problem, in particular when you "abuse" the linker to generate a symbol arrays. It can be worked around by telling the reflected symbols as entry points to the linker, and depending on the amount of symbols this might be neither nice nor convenient.
Conclusion
Combining nm and libdl can actually give quite good results. The combination can be almost as powerful as the level of Reflection used by JUnit 3.x in Java. The level of reflection given is sufficient to implement a JUnit 3.x-style unit test framework for C, including test-case discovery by naming convention.
Involving a parser is more work and limited to objects that you compile yourself, but gives you most power and freedom. The level of reflection given can be sufficient to implement a JUnit 4.x-style unit test framework for C, including test-case discovery by annotations. AceUnit is a unit test framework for C that does exactly this.
Combining parsing and the linker to generate symbol arrays can give very nice results - if your environment is so much under your control that you can ensure that working with the linker that way works for you.
And of course you can combine all approaches to stitch together the bits and pieces until they fit your needs.
You would need to implement it from yourself from the ground up. In straight C, there is no runtime information whatsoever kept on structure and composite types. Metadata simply does not exist in the standard.
Implementing reflection for C would be much simpler... because C is simple language.
There is some basic options for analazing program, like detect if function exists by calling dlopen/dlsym -- depends on your needs.
There are tools for creating code that can modify/extend itselfusing tcc.
You may use the above tool in order to create your own code analizers.
For similar reasons to the author of the question, I have been working on a C-type-reflection-API along with a C reflection graph database format and a clang plug-in that writes reflection metadata.
The intent is to use the C reflection API for writing serialization and deserialization routines, such as mappers for ASN.1, function argument printers, function proxies, fuzzers, etc. Clang and GCC both have plugin APIs that allow access to the AST but there currently is no standard graph format for C reflection metadata.
The proposed C reflection API is called Crefl:
https://github.com/michaeljclark/crefl
The Crefl API provides runtime access to reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field (member), array, constant, variable.
The Crefl reflection graph database format for portable reflection metadata.
The Crefl clang plug-in outputs C reflection metadata used by the library.
The Crefl API provides task-oriented query access to C reflection metadata
A C reflection API provides access to runtime reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field, array, constant, variable. The Crefl C reflection data model is essentially a transcription of the C data types in ISO/IEC 9899:9999.
C intrinsic data types.
integer types.
floating-point types.
complex number types.
boolean type.
nested struct, union, field, and bitfield
arrays and pointers
typedef type aliases
enum and enum constants
functions and function parameters
const, volatile and restrict qualifiers
GNU-C style attributes using (__attribute__).
The library is still a work in progress. The hope is to find others who are interested in reflection support in C.
Parsers and Debug Symbols are great ideas. However, the gotcha is that C does not really have arrays. Just pointers to stuff.
For example, there is no way by reading the source code to know whether a char * points to a character, a string, or a fixed array of bytes based on some "nearby" length field. This is a problem for human readers let alone any automated tool.
Why not use a modern language, like Java or .Net? Can be faster than C as well.

How should I structure complex projects in C? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have little more than beginner-level C skills and would like to know if there are any de facto "standards" to structure a somewhat complex application in C. Even GUI based ones.
I have been always using the OO paradigm in Java and PHP and now that I want to learn C I'm afraid that I might structure my applications in the wrong way. I'm at a loss on which guidelines to follow to have modularity, decoupling and dryness with a procedural language.
Do you have any readings to suggest? I couldn't find any application framework for C, even if I don't use frameworks I've always found nice ideas by browsing their code.
The key is modularity. This is easier to design, implement, compile and maintain.
Identify modules in your app, like classes in an OO app.
Separate interface and implementation for each module, put in interface only what is needed by other modules. Remember that there is no namespace in C, so you have to make everything in your interfaces unique (e.g., with a prefix).
Hide global variables in implementation and use accessor functions for read/write.
Don't think in terms of inheritance, but in terms of composition. As a general rule, don't try to mimic C++ in C, this would be very difficult to read and maintain.
If you have time for learning, take a look at how an Ada app is structured, with its mandatory package (module interface) and package body (module implementation).
This is for coding.
For maintaining (remember that you code once, but you maintain several times) I suggest to document your code; Doxygen is a nice choice for me. I suggest also to build a strong regression test suite, which allows you to refactor.
It's a common misconception that OO techniques can't be applied in C. Most can -- it's just that they are slightly more unwieldy than in languages with syntax dedicated to the job.
One of the foundations of robust system design is the encapsulation of an implementation behind an interface. FILE* and the functions that work with it (fopen(), fread() etc.) is a good example of how encapsulation can be applied in C to establish interfaces. (Of course, since C lacks access specifiers you can't enforce that no-one peeks inside a struct FILE, but only a masochist would do so.)
If necessary, polymorphic behaviour can be had in C using tables of function pointers. Yes, the syntax is ugly but the effect is the same as virtual functions:
struct IAnimal {
int (*eat)(int food);
int (*sleep)(int secs);
};
/* "Subclass"/"implement" IAnimal, relying on C's guaranteed equivalence
* of memory layouts */
struct Cat {
struct IAnimal _base;
int (*meow)(void);
};
int cat_eat(int food) { ... }
int cat_sleep(int secs) { ... }
int cat_meow(void) { ... }
/* "Constructor" */
struct Cat* CreateACat(void) {
struct Cat* x = (struct Cat*) malloc(sizeof (struct Cat));
x->_base.eat = cat_eat;
x->_base.sleep = cat_sleep;
x->meow = cat_meow;
return x;
}
struct IAnimal* pa = CreateACat();
pa->eat(42); /* Calls cat_eat() */
((struct Cat*) pa)->meow(); /* "Downcast" */
All good answers.
I would only add "minimize data structure". This might even be easier in C, because if C++ is "C with classes", OOP is trying to encourage you to take every noun / verb in your head and turn it into a class / method. That can be very wasteful.
For example, suppose you have an array of temperature readings at points in time, and you want to display them as a line-chart in Windows. Windows has a PAINT message, and when you receive it, you can loop through the array doing LineTo functions, scaling the data as you go to convert it to pixel coordinates.
What I have seen entirely too many times is, since the chart consists of points and lines, people will build up a data structure consisting of point objects and line objects, each capable of DrawMyself, and then make that persistent, on the theory that that is somehow "more efficient", or that they might, just maybe, have to be able to mouse over parts of the chart and display the data numerically, so they build methods into the objects to deal with that, and that, of course, involves creating and deleting even more objects.
So you end up with a huge amount of code that is oh-so-readable and merely spends 90% of it's time managing objects.
All of this gets done in the name of "good programming practice" and "efficiency".
At least in C the simple, efficient way will be more obvious, and the temptation to build pyramids less strong.
The GNU coding standards have evolved over a couple of decades. It'd be a good idea to read them, even if you don't follow them to the letter. Thinking about the points raised in them gives you a firmer basis on how to structure your own code.
If you know how to structure your code in Java or C++, then you can follow the same principles with C code. The only difference is that you don't have the compiler at your side and you need to do everything extra carefully manually.
Since there are no packages and classes, you need to start by carefully designing your modules. The most common approach is to create a separate source folder for each module. You need to rely on naming conventions for differentiating code between different modules. For example prefix all functions with the name of the module.
You can't have classes with C, but you can easily implement "Abstract Data Types". You create a .C and .H file for every abstract data type. If you prefer you can have two header files, one public and one private. The idea is that all structures, constants and functions that need to be exported go to the public header file.
Your tools are also very important. A useful tool for C is lint, which can help you find bad smells in your code. Another tool you can use is Doxygen, which can help you generate documentation.
Encapsulation is always key to a successful development, regardless of the development language.
A trick I've used to help encapsulate "private" methods in C is to not include their prototypes in the ".h" file.
I'd suggets you to check out the code of any popular open source C project, like... hmm... Linux kernel, or Git; and see how they organize it.
The number rule for complex application: it should be easy to read.
To make complex application simplier, I employ Divide and conquer.
I would suggest reading a C/C++ textbook as a first step. For example, C Primer Plus is a good reference. Looking through the examples would give you and idea on how to map your java OO to a more procedural language like C.

Resources