Static functions in Linux device driver? - c

Is there a reason why most function definition in device driver in linux code is defined as static? Is there a reason for this?
I was told this is for scoping and to prevent namespace pollution, could anyone explain it in detail why static definition is used in this context?

Functions declared static are not visible outside the translation unit they are defined in (a translation unit is basically a .c file). If a function does not need to be called from outside the file, then it should be made static so as to not pollute the global namespace. This makes conflicts between names that are the same are less likely to happen. Exported symbols are usually indentified with some sort of subsystem tag, which further reduces scope for conflict.
Often, pointers to these functions end up in structs, so they are actually called from outside the file they are defined in, but not by their function name.

For the same reasons you use static in any code. You should only 'publish' your API calls, anything else opens you up to abuse, such as being able to call internal functions from outside the driver, something that would almost certainly be catastrophic.
It's good programming practice to only make visible to the outside world what's necessary. That's what encapsulation is all about.

I concur. This is common and wise practice in any C code - not just kernel code! Don't go thinking this is only appropriate for low level stuff, any C code that stretches past one .c file should have thought given to this.

Related

Do i really needed accessor functions to access global variable from another file?

In my code (game engine code) there are multiple source (.c) files which maintain the status of the game, status like
START
CONFIGURE
STOP
END
DEFAULT
RUNNING
for maintaining state, one global variable gameStatus used which shared between multiple source files using extern keyword. now I have read that the global variable is bad to use and it allows the outside module to change it and as the number of components using global variable increases, the complexity of the interactions can also increase.
So I have limited scope that variable to one file using static keyword and added accessor methods like get or set APIs in the same file. So other files only access that variable using accessor APIs.
I have removed the global variable that is good but now every other file which used that global variable have to call accessor APIs which seems to add the overhead of function calls,
so now I am confused which is better? any C standard about how efficiently share the data between different source files?
The fact that global variables are "bad practice" is entirely opinion based and 100% dependent on the context. It is impossible to say whether you are applying such "bad practice" or not without looking at your code. Global variables are not bad practice per se, using them in the wrong way is. Global variables are often necessary in C. Take as an example the C standard library: errno is a global variable that is used basically everywhere in both library code and user code to check for errors. Is that bad practice? Could they have defined a function get_errno() instead (well to be honest they actually did it's just hidden... but that's for complex concurrency reasons)? I'll let you decide.
In your specific case, changing a globally visible variable to static and then creating two functions only to get and set its value is totally unnecessary. Any part of the code can still modify the variable, but now it's just more annoying to do so, and it could also lead to slower code if not optimized correctly. All in all, by creating those functions you just stripped the variable of the static qualifier.

Why is it a convention to use static functions in a LKM

I've been researching this a lot lately and have looked into various articles and stackoverflow posts but I can't seem to find a straight answer. When creating a kernel module I have seen most code look like this:
#include <linux/init.h>
static int test_init(void) {return 0;}
static void test_exit(void) {;}
module_init(test_init);
module_exit(test_exit);
One possible reason I have found is that doing this increases the difficulty of injecting malicious code into a running module.
Another is less cluttering of the namespace but wouldn't that only be an issue in the context of the kernel module you are linking and compiling and nothing else? If insmod actually links the code into the kernel like ld would then I can see how name clashes would mess up the system. Is this the reason?
I cannot think of any other reasons and I would like this to be clarified before I blindly start using conventions.
Thank you in advance
If a function isn't needed outside of a .c file, it should be declared static within that .c file.
That's just good encapsulation.
It avoids name collisions and lets the reader know your intent.
If the compiler decides to inline all called instances of a static function then the compiler doesn't need to output object code for the function because it knows all the instances were inlined. However, if you don't declare it as static then the compiler can't be sure that it isn't called from somewhere else.
Also, declaring something as static prevents it from entering the global name space. This is important in C which doesn't have name mangling so there can only be one function with one name (even if it acts on different types). So you get to use short function names for static functions knowing that they won't clash with anyone else.
Nothing specific to the kernel or operating systems. Just good programming practice.

Lesser of two evils when using globals via extern

I'm working with some old code that uses many global variables. I'm fully aware of many of the disadvantages of using global variables, so my question is not about whether I should be using global variables or not.
After reviewing much of the code I've noticed two patterns and I'm trying to decide which one is worse and why.
A similarity between the two patterns is that the global variables are exposed using "extern".
The main difference between the two patterns is:
Some globals are extern'ed/exposed in header files, which are in
turn included in many source files with a #include
Other globals are extern'ed/exposed directly in the source file itself
Which of these two would you believe is worse than the other? And Why?
Would you consider them equally bad? And Why?
1) Hide what you can. If they do not need to be visible, don't allow people to use them (by providing their declarations).
2) Use a static if extern is not necessary and… hide what you can.
Which of these two would you believe is worse than the other? And Why?
The first; Because it is unnecessarily visible to other translations. The second can cause linker errors, but it will take insider knowledge to use correctly in another source/translation. The linker issue can then be resolved by making it static (again, if its declaration is visible to one translation).
Would you consider them equally bad? And Why?
No. If you can hide the globals' implementation and restrict their access, you have done your codebase a favor.
I tend to think of C header files as falling into one of three categories: public, private, and protected (not to be confused with the C++ keywords or the same names). Public are for anything that is meant to be accessed by anyone. Private are for everything that is meant only for internal implementation of a module (if split into multiple files); these are never visible outside of the module. Protected are for those items that are not generally expected to be accessed by another module but for some reason or another it needs to be (module coupling can occur here).
To me a symbol (such as a global variable) extern'ed in the C source file instead of a header file is a violation of these "rules" and is interpreted as a code smell.
Hope this helps.

A Java programmer has questions regarding C header files

I have a fair amount of practice with Java as a programming language, but I am completely new to C. I understand that a header file contains forward declarations for methods and variables. How is this different from an abstract class in Java?
The short answer:
Abstract classes are a concept of object oriented programming. Header files are a necessity due to the way that the C language is constructed. It cannot be compared in any way
The long answer
To understand the header file, and the need for header files, you must understand the concepts of "declaration" and "definition". In C and C++, a declaration means, that you declare that something exists somewhere, for example a function.
void Test(int i);
We have now declared, that somewhere in the program, there exists a function Test, that takes a single int parameter. When you have a definition, you define what it is:
void Test(int i)
{
...
}
Here we have defined what the function void Test(int) actually is.
Global variables are declared using the extern keyword
extern int i;
They are defined without the extern keyword
int i;
When you compile a C program, you compile each source file (.c file) into an .obj file. Definitions will be compiled into the .obj file as actual code. When all these have been compiled, they are linked to the final executable. Therefore, a function should only be defined on one .c file, otherwise, the same function will end up multiple times in the executable. This is not really critical if the function definitions are identical. It is more problematic if a global variable is linked into the same executable twice. That will leave half the code to use the one instance, and the other half of the code to use the other instance.
But functions defined in one .c file cannot see functions defined in another .c files. So if from file1.c file you need to access function Test(int) defined in file2.c, you need to have a declaration of Test(int) present when compiling file1.c. When file1.c is compiled into file1.obj, the resulting .obj file will contain information that it needs Test(int) to be defined somewhere. When the program is linked, the linker will identify that file2.obj contains the function that file1.obj depends on.
If there is no .obj file containing the definition for this function, you will get a linker error, not a compiler error (linker errors are considerably more difficult to find and correct that compiler errors because you get no filename and line number for the resulting file)
So you use the header file to store declarations for the definitions stored in the corresponding source file.
IMO it's mainly because many C programmers seem to think that Java programmers don't know how to program “for real”, e.g. handling pointers, memory and so on.
I would rather compare headers to Java interfaces, in the sense that they generally define how the API must be used.
Headers are basically just a way to avoid copy-pasting: the preprocessor simply includes the content of the header in the source file when encounters an #include directive.
You put in a header every declaration that the user will commonly use.
Here's the answers:
Java has had a bad reputation among some hardcore C programmers mainly because they think:
it's "too easy" (no memory-management, segfaults)
"can't be used for serious work"
"just for the web" or,
"slow".
Java is hardly the easiest language in the world these days, compared to some lanmguages like Python, etc.
It is used in many desktop apps - applets aren't even used that often. Finally, Java will always be slower than C, because it is not compiled directly to machine code. Sometimes, though, extreme speed isn't needed. Anyway, the JVM isn't the slowest language VM ever.
When you're working in C, there aren't abstract classes.
All a header file does is contain code which is pasted into other files. The main reason you put it in a header file is so that it is at the top of the file - this way, you don't need to care where you put your functions in the actual implementation file.
While you can kind-of use OO concepts in C, it doesn't have built-in support for classes and similar fundamentals of OO. It is nigh-impossible to implement inheritance in plain C, therefore there can never actually have OO, or abstract classes for that matter. I would suggest sticking to plain old structs.
If it makes it easier for you to learn, by all means think of them as abstract classes (with the implementation file being the inheriting class) - but IMHO it is a difficult mindset to use when for working in a language without explicit support of said features.
I'm not sure if Java has them, but I think a closer analogue could be partial classes in C#.
If you forward declare something, you have to actually deliver and implement it, else the compiler will complain. The header allows you to display a "module"'s public API and make the declarations available (for type checking and so) to other parts of the program.
Comprehensive reading: Learning C from Java. Recommended reading for developers who are coming from Java to C.
I think that there is much derision (mockery, laughter, contempt, ridicule) for Java simply because it's popular.
Abstract classes and interfaces specify a contract or a set of functions that can be invoked on an object of a certain type. Function prototypes in C only really do compile time type checking of function arguments/return values.
While your first question seems subjective to me, I will answer to the second one:
A header file contains the declarations which are then made available to other files via #inclusion by the preprocessor.
For instance you will declare in a header a function, and you will implement in a .c file. Other files will be able to use the function so long they can see the declaration (by including the header file).
At linking time the linker will look among the object files, or the various libraries linked, for some object which provides the code for the function.
A typical pattern is: you distribute the header files for your library, and a dll (for instance) which contains the object code. Then in your application you include the header, and the compiler will be able to compile because it will find the declaration in the header. No need to provide the actual implementation of the code, which will be available for the linker through the dll.
C programs run directy, while Java programs run inside the JVM, so a common belief is that Java programs are slow. Also in Java you are hidden from some low level constructs (pointer, direct memory access), memory management, etc...
In C the declaration and definition of a function is separated. Declaration "declares" that there exists a function that called by those arguments returns something. Definition "defines" what the function actually does. The former is done in header files, the latter in the actual code. When you are compiling your code, you must use the header files to tell your compiler that there is such a function, and link in a binary that contains the binary code for the function.
In Java, the binary code itself also contains the declaration of the functions, so it is enough for the compiler to look at the class files to get both the definition and declaration of the available functions.

What methods are there to modularize C code?

What methods, practices and conventions do you know of to modularize C code as a project grows in size?
Create header files which contain ONLY what is necessary to use a module. In the corresponding .c file(s), make anything not meant to be visible outside (e.g. helper functions) static. Use prefixes on the names of everything externally visible to help avoid namespace collisions. (If a module spans multiple files, things become harder., as you may need to expose internal things and not be able hide them with "static")
(If I were to try to improve C, one thing I would do is make "static" the default scoping of functions. If you wanted something visible outside, you'd have to mark it with "export" or "global" or something similar.)
OO techniques can be applied to C code, they just require more discipline.
Use opaque handles to operate on objects. One good example of how this is done is the stdio library -- everything is organised around the opaque FILE* handle. Many successful libraries are organised around this principle (e.g. zlib, apr)
Because all members of structs are implicitly public in C, you need a convention + programmer discipline to enforce the useful technique of information hiding. Pick a simple, automatically checkable convention such as "private members end with '_'".
Interfaces can be implemented using arrays of pointers to functions. Certainly this requires more work than in languages like C++ that provide in-language support, but it can nevertheless be done in C.
The High and Low-Level C article contains a lot of good tips. Especially, take a look at the "Classes and objects" section.
Standards and Style for Coding in ANSI C also contains good advice of which you can pick and choose.
Don't define variables in header files; instead, define the variable in the source file and add an extern statement (declaration) in the header. This will tie into #2 and #3.
Use an include guard on every header. This will save so many headaches.
Assuming you've done #1 and #2, include everything you need (but only what you need) for a certain file in that file. Don't depend on the order of how the compiler expands your include directives.
The approach that Pidgin (formerly Gaim) uses is they created a Plugin struct. Each plugin populates a struct with callbacks for initialization and teardown, along with a bunch of other descriptive information. Pretty much everything except the struct is declared as static, so only the Plugin struct is exposed for linking.
Then, to handle loose coupling of the plugin communicating with the rest of the app (since it'd be nice if it did something between setup and teardown), they have a signaling system. Plugins can register callbacks to be called when specific signals (not standard C signals, but a custom extensible kind [identified by string, rather than set codes]) are issued by any part of the app (including another plugin). They can also issue signals themselves.
This seems to work well in practice - different plugins can build upon each other, but the coupling is fairly loose - no direct invocation of functions, everything's through the signaling stystem.
A function should do one thing and do this one thing well.
Lots of little function used by bigger wrapper functions help to structure code from small, easy to understand (and test!) building blocks.
Create small modules with a couple of functions each. Only expose what you must, keep anything else static inside of the module. Link small modules together with their .h interface files.
Provide Getter and Setter functions for access to static file scope variables in your module. That way, the variables are only actually written to in one place. This helps also tracing access to these static variables using a breakpoint in the function and the call stack.
One important rule when designing modular code is: Don't try to optimize unless you have to. Lots of small functions usually yield cleaner, well structured code and the additional function call overhead might be worth it.
I always try to keep variables at their narrowest scope, also within functions. For example, indices of for loops usually can be kept at block scope and don't need to be exposed at the entire function level. C is not as flexible as C++ with the "define it where you use it" but it's workable.
Breaking the code up into libraries of related functions is one way of keeping things organized. To avoid name conflicts you can also use prefixes to allow you to reuse function names, though with good names I've never really found this to be much of a problem. For example, if you wanted to develop your own math routines but still use some from the standard math library, you could prefix yours with some string: xyz_sin(), xyz_cos().
Generally I prefer the one function (or set of closely related functions) per file and one header file per source file convention. Breaking files into directories, where each directory represents a separate library is also a good idea. You'd generally have a system of makefiles or build files that would allow you to build all or part of the entire system following the hierarchy representing the various libraries/programs.
There are directories and files, but no namespaces or encapsulation. You can compile each module to a separate obj file, and link them together (as libraries).

Resources