Lesser of two evils when using globals via extern - c

I'm working with some old code that uses many global variables. I'm fully aware of many of the disadvantages of using global variables, so my question is not about whether I should be using global variables or not.
After reviewing much of the code I've noticed two patterns and I'm trying to decide which one is worse and why.
A similarity between the two patterns is that the global variables are exposed using "extern".
The main difference between the two patterns is:
Some globals are extern'ed/exposed in header files, which are in
turn included in many source files with a #include
Other globals are extern'ed/exposed directly in the source file itself
Which of these two would you believe is worse than the other? And Why?
Would you consider them equally bad? And Why?

1) Hide what you can. If they do not need to be visible, don't allow people to use them (by providing their declarations).
2) Use a static if extern is not necessary and… hide what you can.
Which of these two would you believe is worse than the other? And Why?
The first; Because it is unnecessarily visible to other translations. The second can cause linker errors, but it will take insider knowledge to use correctly in another source/translation. The linker issue can then be resolved by making it static (again, if its declaration is visible to one translation).
Would you consider them equally bad? And Why?
No. If you can hide the globals' implementation and restrict their access, you have done your codebase a favor.

I tend to think of C header files as falling into one of three categories: public, private, and protected (not to be confused with the C++ keywords or the same names). Public are for anything that is meant to be accessed by anyone. Private are for everything that is meant only for internal implementation of a module (if split into multiple files); these are never visible outside of the module. Protected are for those items that are not generally expected to be accessed by another module but for some reason or another it needs to be (module coupling can occur here).
To me a symbol (such as a global variable) extern'ed in the C source file instead of a header file is a violation of these "rules" and is interpreted as a code smell.
Hope this helps.

Related

Do i really needed accessor functions to access global variable from another file?

In my code (game engine code) there are multiple source (.c) files which maintain the status of the game, status like
START
CONFIGURE
STOP
END
DEFAULT
RUNNING
for maintaining state, one global variable gameStatus used which shared between multiple source files using extern keyword. now I have read that the global variable is bad to use and it allows the outside module to change it and as the number of components using global variable increases, the complexity of the interactions can also increase.
So I have limited scope that variable to one file using static keyword and added accessor methods like get or set APIs in the same file. So other files only access that variable using accessor APIs.
I have removed the global variable that is good but now every other file which used that global variable have to call accessor APIs which seems to add the overhead of function calls,
so now I am confused which is better? any C standard about how efficiently share the data between different source files?
The fact that global variables are "bad practice" is entirely opinion based and 100% dependent on the context. It is impossible to say whether you are applying such "bad practice" or not without looking at your code. Global variables are not bad practice per se, using them in the wrong way is. Global variables are often necessary in C. Take as an example the C standard library: errno is a global variable that is used basically everywhere in both library code and user code to check for errors. Is that bad practice? Could they have defined a function get_errno() instead (well to be honest they actually did it's just hidden... but that's for complex concurrency reasons)? I'll let you decide.
In your specific case, changing a globally visible variable to static and then creating two functions only to get and set its value is totally unnecessary. Any part of the code can still modify the variable, but now it's just more annoying to do so, and it could also lead to slower code if not optimized correctly. All in all, by creating those functions you just stripped the variable of the static qualifier.

Is this a reasonable hack to inline functions across translation units?

I'm writing performance-sensitive code that really requires me to force certain function calls to be inlined.
For inline functions that are shared between translation units via a header, one would normally have to put the function definition in the header file. I don't want to do that. Some of these functions operate on complex data structures that should not be exposed in the header.
I've gotten around this by simply #including all the .h and .c files once each into a single .c file, so that there is only one translation unit. (That slows down re-compiles, but not by enough to matter.)
This would be "problem solved," but it eliminates getting an error when a function in one C file calls a function in another C file that is supposed to be private, and I want to get an error in that case. So, I have a separate Makefile entry that does a "normal" build, just to check for this case.
In order to force functions declared inline to play nicely in the "normal" build, I actually define a macro, may_inline, which is used where the inline attribute normally would be. It is defined as empty for a normal build and is defined as "inline" for an optimized build.
This seems like an acceptable solution. The only downside I can see is that I can't have private functions in different .c files that have the same prototype, but so far, that hasn't been much of an issue for me.
Another potential solution is to use GCC's Link-Time Optimization, which is supposed to allow inlining across translation units. It's a new feature, though, and I don't trust it to always inline things the way I would want. Furthermore, I can only get it working on trivial problems, not my actual code.
Is this an acceptable hack, or am I doing something incredibly stupid? The fact that I've never seen this done before makes me a bit nervous.
Unity build is an absolutely valid approach and has been widely used in industry since forever (see e.g. this post). Recent versions of Visual Studio even provide builtin support for them.
LTO has a downside of not being portable even across compilers for the same platform.

What methods are there to modularize C code?

What methods, practices and conventions do you know of to modularize C code as a project grows in size?
Create header files which contain ONLY what is necessary to use a module. In the corresponding .c file(s), make anything not meant to be visible outside (e.g. helper functions) static. Use prefixes on the names of everything externally visible to help avoid namespace collisions. (If a module spans multiple files, things become harder., as you may need to expose internal things and not be able hide them with "static")
(If I were to try to improve C, one thing I would do is make "static" the default scoping of functions. If you wanted something visible outside, you'd have to mark it with "export" or "global" or something similar.)
OO techniques can be applied to C code, they just require more discipline.
Use opaque handles to operate on objects. One good example of how this is done is the stdio library -- everything is organised around the opaque FILE* handle. Many successful libraries are organised around this principle (e.g. zlib, apr)
Because all members of structs are implicitly public in C, you need a convention + programmer discipline to enforce the useful technique of information hiding. Pick a simple, automatically checkable convention such as "private members end with '_'".
Interfaces can be implemented using arrays of pointers to functions. Certainly this requires more work than in languages like C++ that provide in-language support, but it can nevertheless be done in C.
The High and Low-Level C article contains a lot of good tips. Especially, take a look at the "Classes and objects" section.
Standards and Style for Coding in ANSI C also contains good advice of which you can pick and choose.
Don't define variables in header files; instead, define the variable in the source file and add an extern statement (declaration) in the header. This will tie into #2 and #3.
Use an include guard on every header. This will save so many headaches.
Assuming you've done #1 and #2, include everything you need (but only what you need) for a certain file in that file. Don't depend on the order of how the compiler expands your include directives.
The approach that Pidgin (formerly Gaim) uses is they created a Plugin struct. Each plugin populates a struct with callbacks for initialization and teardown, along with a bunch of other descriptive information. Pretty much everything except the struct is declared as static, so only the Plugin struct is exposed for linking.
Then, to handle loose coupling of the plugin communicating with the rest of the app (since it'd be nice if it did something between setup and teardown), they have a signaling system. Plugins can register callbacks to be called when specific signals (not standard C signals, but a custom extensible kind [identified by string, rather than set codes]) are issued by any part of the app (including another plugin). They can also issue signals themselves.
This seems to work well in practice - different plugins can build upon each other, but the coupling is fairly loose - no direct invocation of functions, everything's through the signaling stystem.
A function should do one thing and do this one thing well.
Lots of little function used by bigger wrapper functions help to structure code from small, easy to understand (and test!) building blocks.
Create small modules with a couple of functions each. Only expose what you must, keep anything else static inside of the module. Link small modules together with their .h interface files.
Provide Getter and Setter functions for access to static file scope variables in your module. That way, the variables are only actually written to in one place. This helps also tracing access to these static variables using a breakpoint in the function and the call stack.
One important rule when designing modular code is: Don't try to optimize unless you have to. Lots of small functions usually yield cleaner, well structured code and the additional function call overhead might be worth it.
I always try to keep variables at their narrowest scope, also within functions. For example, indices of for loops usually can be kept at block scope and don't need to be exposed at the entire function level. C is not as flexible as C++ with the "define it where you use it" but it's workable.
Breaking the code up into libraries of related functions is one way of keeping things organized. To avoid name conflicts you can also use prefixes to allow you to reuse function names, though with good names I've never really found this to be much of a problem. For example, if you wanted to develop your own math routines but still use some from the standard math library, you could prefix yours with some string: xyz_sin(), xyz_cos().
Generally I prefer the one function (or set of closely related functions) per file and one header file per source file convention. Breaking files into directories, where each directory represents a separate library is also a good idea. You'd generally have a system of makefiles or build files that would allow you to build all or part of the entire system following the hierarchy representing the various libraries/programs.
There are directories and files, but no namespaces or encapsulation. You can compile each module to a separate obj file, and link them together (as libraries).

Code Ordering in Source Files - Forward Declarations vs "Don't Repeat Yourself"? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
If you code in C and configure your compiler to insist that all functions are declared before they are used (or if you code in C++), then you can end up with one of (at least) two organizations for your source files.
Either:
Headers
Forward declarations of (static) functions in this file
External functions (primary entry points)
Static - non-public - functions
Or:
Headers
Static - non-public - functions
External functions (primary entry points)
I recognize that in C++, the term 'static' is not preferred, but I'm primarily a C programmer and the equivalent concept exists in C++, namely functions in an anonymous namespace within the file.
Question:
Which organization do you use, and why do you prefer it?
For reference, my own code uses the second format so that the static functions are defined before they are used, so that there is no need to both declare them and define them, which saves on having the information about the function interfaces written out twice - which, in turn, reduces (marginally) the overhead when an internal interface needs to change. The downside to that is that the first functions defined in the file are the lowest-level routines - the ones that are called by functions defined later in the file - so rather than having the most important code at the top, it is nearer the bottom of the file. How much does it matter to you?
I assume that all externally accessible functions are declared in headers, and that this form of repetition is necessary - I don't think that should be controversial.
I've always used method #1, the reason being that I like to be able to quickly tell which functions are defined in a particular file and see their signatures all in one place. I don't find the argument of having to change the prototypes along with the function definition particularly convincing since you usually wind up changing all the code that calls the changed functions anyway, changing the function prototypes while you are at it seems relatively trivial.
In C code I use a simple rule:
Every C file with non-static members will have a corresponding header file defining those members.
This has worked really well for me in the past - makes it easy enough to find the definition of a function because it's in the same-named .h file if I need to look it up. It also works well with doxygen (my preferred tool) because all the cruft is kept in the header where I don't spend most of my time - the C file is full of code.
For static members in a file I insist in ordering the declarations in such a way that they are defined by instantiation before use anyway. And, I avoid circular dependency in function calls almost all of the time.
For C++ code I tried the following:
All code defined in the header file. Use #pragma interface/#pragma implementation to inform the compiler of that; kind of the same way templates put all the code in the header.
That's worked really well for me in C++. It means you end up with HUGE header files which can increase compile time in some cases. You also end up with a C++ body file where you simply include the header and compile. You can instantiate your static member variables here. It also became a nightmare because it was far too easy to change your method params and break your code.
I moved to
Header file with doxygen comments (except for templates, where code must be included in the header) and full body file, except for short methods which I know I'd prefer be inlined when used.
Separating out implementation from definition has the distinct plus that it's harder to change your method/function signatures so you're less likely to do it and break things. It also means that I can have huge doxygen blocks in the header file documenting how things work and work in the code relatively interruption free except for useful comments like "declare a variable called i" (tongue in cheek).
Ada forces the convention and the file naming scheme on you. Most dynamic languages like Ruby, Python, etc don't generally care where/if you declare things.
Number 2: because I write many short functions and refactor them freely, it'd be a significant nuisance to maintain forward declarations. If there's an Emacs extension that does that for you with no fuss, I'd be interested, since the top-down organization is a bit more readable. (I prefer top-down in e.g. Python.)
Actually not quite your Number 2, because I generally group related functions together in the .c regardless of whether they're public or private. If I want to see all the public declarations I'll look in the header.
Number 2 for me.
I think using static or other methods to make your module functions and variables private to the module is a good practice.
I prefer to have my api functions at the bottom of the module. Conversely I put the api functions at the top of my classes as classes are generally reusable. Putting the api functions at the top make it easier to find them quickly. Most IDEs, can take you to any function pretty directly.
(Talking about C code)
Number 2 for me because I always forget to update forward decls to reflect static functions changes.
But I think that the best practice should be
headers
forward declarations + comment on function behaviour for each one
exported functions + eventual comments about implementation details when code is not clear enough
static functions + eventual comments about implementation details
How much does it matter to you?
It's not.
It is important that all local function will be marked as static, but for my opinion defining how to group function in the file is too much. There is no strong reasoning for any version and i don't find any strong disadvantage ever.
In general coding convention is very important and we trying to define as much as possible, but in this case my feeling, that this is unjustified overhead.
After reading all posts again it seems like i should simply upvote (which i did) Darius answer, instead writing all of these ...

Static functions in Linux device driver?

Is there a reason why most function definition in device driver in linux code is defined as static? Is there a reason for this?
I was told this is for scoping and to prevent namespace pollution, could anyone explain it in detail why static definition is used in this context?
Functions declared static are not visible outside the translation unit they are defined in (a translation unit is basically a .c file). If a function does not need to be called from outside the file, then it should be made static so as to not pollute the global namespace. This makes conflicts between names that are the same are less likely to happen. Exported symbols are usually indentified with some sort of subsystem tag, which further reduces scope for conflict.
Often, pointers to these functions end up in structs, so they are actually called from outside the file they are defined in, but not by their function name.
For the same reasons you use static in any code. You should only 'publish' your API calls, anything else opens you up to abuse, such as being able to call internal functions from outside the driver, something that would almost certainly be catastrophic.
It's good programming practice to only make visible to the outside world what's necessary. That's what encapsulation is all about.
I concur. This is common and wise practice in any C code - not just kernel code! Don't go thinking this is only appropriate for low level stuff, any C code that stretches past one .c file should have thought given to this.

Resources