I've been researching this a lot lately and have looked into various articles and stackoverflow posts but I can't seem to find a straight answer. When creating a kernel module I have seen most code look like this:
#include <linux/init.h>
static int test_init(void) {return 0;}
static void test_exit(void) {;}
module_init(test_init);
module_exit(test_exit);
One possible reason I have found is that doing this increases the difficulty of injecting malicious code into a running module.
Another is less cluttering of the namespace but wouldn't that only be an issue in the context of the kernel module you are linking and compiling and nothing else? If insmod actually links the code into the kernel like ld would then I can see how name clashes would mess up the system. Is this the reason?
I cannot think of any other reasons and I would like this to be clarified before I blindly start using conventions.
Thank you in advance
If a function isn't needed outside of a .c file, it should be declared static within that .c file.
That's just good encapsulation.
It avoids name collisions and lets the reader know your intent.
If the compiler decides to inline all called instances of a static function then the compiler doesn't need to output object code for the function because it knows all the instances were inlined. However, if you don't declare it as static then the compiler can't be sure that it isn't called from somewhere else.
Also, declaring something as static prevents it from entering the global name space. This is important in C which doesn't have name mangling so there can only be one function with one name (even if it acts on different types). So you get to use short function names for static functions knowing that they won't clash with anyone else.
Nothing specific to the kernel or operating systems. Just good programming practice.
Related
Looking at nginx's code I see that pretty much everything is prefixed with ngx_.
Files:
ngx_list.c
ngx_list.h
ngx_log.c
ngx_log.h
Code:
ngx_log_t *ngx_log_init(u_char *prefix);
void ngx_cdecl ngx_log_abort(ngx_err_t err, const char *fmt, ...);
void ngx_cdecl ngx_log_stderr(ngx_err_t err, const char *fmt, ...);
LuaJIT is pretty much the same thing but with lj_.
Files:
lj_alloc.c
lj_alloc.h
lj_api.c
lj_arch.h
Code:
LJ_ASMF void LJ_FASTCALL lj_vm_ffi_call(CCallState *cc);
LJ_FUNC CTypeID lj_ccall_ctid_vararg(CTState *cts, cTValue *o);
LJ_FUNC int lj_ccall_func(lua_State *L, GCcdata *cd);
Other project's do the same thing, these are just two that come to mind. Why do they do this? If it was the project's public API I would get it as it will be exposed to third party code. But the code I copied is part of the (private) implementation so why namespace it?
I suspect there's no hard-and-fast reason. I suspect it's just something that people (some people) feel more comfortable with. I wouldn't do it myself, but I guess I can see the appeal.
For example, I have a multiprecision arithmetic library I wrote for fun one day. It has functions like mp_add(), mt_sub(), etc. And the source for these functions lives in files add.c and sub.c.
Now, since all the source code for this library lives in a subdirectory named mp, I have never been tempted to give the files names like mp_add.c or mp_sub.c. That would just be redundant: the names are already in a very real sense mp/add.c, mp/sub.c, etc.
But I have to admit that it does feel a teensy bit weird going to a file named add.c to check on my multiprecision addition code. It's not integer addition code, or fixed-point or rational-number addition code, or general-purpose addition code. It's very specifically multiprecision addition code, and the functions defined within do all have that mp_ prefix. So shouldn't the filename have that prefix, also?
As I said, no, in the end I wouldn't (I didn't) give it that prefix. But as I also said, I guess I can see the appeal.
Addendum: Above I answered about filenames, but you also asked about internal -- "private" -- function names. And those are different; those definitely need a prefix, at least in C.
The issue is that C does not really have any namespace mechanisms. So you almost always have to fake it with project-specific prefixes on all global symbols.
Consider the function ngx_log_abort(). It's private to nginx; client code wouldn't be calling it. But it's a global function, so if it were just named log_abort, there would be a pretty high chance of a collision with a completely different function in the the client code (or in some other library code) also named log_abort.
You may ask, then why is ngx_log_abort a global function? And of course the answer is that any of the functions making up the nginx library might need to call it, so it pretty much has to be global.
You may ask, then why isn't ngx_log_abort a file-scope static function? And the answer there is, that would work if all the source code for the entire nginx library were confined to a single C source file nginx.c. But the authors probably didn't want to confine themselves that way.
If you want to write a well-encapsulated library in C, you have two choices for your "private" functions:
Make them file-scope static, and limit yourself to using a single source file for most or all of your library.
Make them truly global, but with uniqueifying prefixes. Also don't put declarations for them in public header files. (That way clients can't call them without cheating.)
In other languages you have other mechanisms for hiding private symbols, but not in C.
Let's say you are writing a library and you have a bunch of utility functions you have written just for yourself. Of course, you wouldn't want these functions to have external linkage so that they won't get mixed up by your library users (mostly because you are not going to tell the outside world of their existence)
On the other hand, these functions may be used in different translation units, so you want them to be shared internally.
Let's give an example. You have a library that does some stuff and in different source files you may need to copy_file and create_directory, so you would implement them as utility functions.
To make sure the user of your library doesn't accidentally get a linkage error because of having a function with the same name, I can think of the following solutions:
Terrible way: Copy paste the functions to every file that uses them adding static to their declaration.
Not a good way: Write them as macros. I like macros, but this is just not right here.
Give them such a weird name, that the chances of the user producing the same name would be small enough. This might work, but it makes the code using them very ugly.
What I do currently: Write them as static functions in an internal utils.h file and include that file in the source files.
Now the last option works almost fine, except it has one issue: If you don't use one of the functions, at the very least you get a warning about it (that says function declared static but never used). Call me crazy, but I keep my code warning free.
What I resorted to do was something like this:
utils.h:
...
#ifdef USE_COPY_FILE
static int copy_file(/* args */)
{...}
#endif
#ifdef USE_CREATE_DIR
static int create_dir(/* args */)
{...}
#endif
...
file1.c:
#define USE_COPY_FILE
#define USE_CREATE_DIR
#include "utils.h"
/* use both functions */
file2.c
#define USE_COPY_FILE
#include "utils.h
/* use only copy_file */
The problem with this method however is that it starts to get ugly as more utilities are introduced. Imagine if you have 10 of such functions, you need to have 7~8 lines of define before the include, if you need 7~8 of these functions!
Of course, another way would be to use DONT_USE_* type of macros that exclude functions, but then again you need a lot of defines for a file that uses few of these utility functions.
Either way, it doesn't look elegant.
My question is, how can you have functions that are internal to your own library, used by multiple translation units, and avoid external linkage?
Marking the functions static inline instead of static will make the warnings go away. It will do nothing about the code bloat of your current solution -- you're putting at least one copy of the function into each TU that uses it, and this will still be the case. Oli says in a comment that the linker might be smart enough to merge them. I'm not saying it isn't, but don't count on it :-)
It might even make the bloat worse, by encouraging the compiler to actually inline calls to the functions so that you get multiple copies per TU. But it's unlikely, GCC mostly ignores that aspect of the inline keyword. It inlines calls or not according to its own rules.
That's basically the best you can do portably. There's no way in standard C to define a symbol that's external from the POV of certain TUs (yours), but not from the POV of others (your users'). Standard C doesn't really care what libraries are, or the fact that TUs might be linked in several steps, or the difference between static and dynamic linking. So if you want the functions to be actually shared between your TUs, without any external symbol that could interfere with users of the library, then you need to do something specific to GCC and/or your static library or dll format to remove the symbols once the library is built but before the user links against it.
You can link your library normally, having these functions global, and localize them later.
objcopy can take global symbols and make them local, so they can't be linked with. It can also delete the symbol (the function stays, resolved references to it remain resolved, just the name is gone).
objcopy -L symbol localizes symbol. You can repeat -L multiple times.
objcopy -G symbol keeps symbol global, but localizes all others. You can repeat it also, and it will keep global all those you specified.
And I just found that I'm repeating the answer to this question, which Oli Charlesworth referenced in his comment.
I am trying to write a generic library in pure c , just some data structures like stack, queue...
In my stack.h when giving name to those functions. I have questions about that.
Can I use such name, for example "init" as the function name to init a stack. Will there be something wrong?
I know maybe there exist other functions which just do other things and have the same name as "init". Then would the program be confused, especially when i both include the different init's headers.
3.I know my worry may be unnecessary, but i still want to know the principle.
Any help is appreciated, thanks.
Can I use such name, for example "init" as the function name to init a
stack. Will there be something wrong?
Yes, if anyone else wants a function named init.
I know my worry may be unnecessary, but i still want to know the
principle
Your worry is necessary, this (the lack of namespaces) is a serious problem in C.
Export as few functions as possible. Make everything static if you can
Prefix function names with something. For instance, instead of init, try stack_init
You don't have namespaces in C so usually you prefix every identifier with the name or nickname of your library.
init();
becomes
fancy_lib_init();
There might be existing libraries doing what you want (e.g. Glib). At least, study them a little before writing your own.
If you claim to develop a generic reusable C library, I suggest having naming conventions. For instance, have all the identifiers (notably function names, typedef-s, struct names...) share some common prefix.
Be systematic in your naming conventions. For instance, initializers for stacks and for queues should have similar names & signatures, and end with _init. Document your naming conventions.
Define very clearly how should data be allocated and released. Who and when should call free?
init() might be okay (if you're including your library into something else as an actual library, rather than compiling its source in), but it's better practice to use something like stack_init(), and to prefix your library's functions with stack_ or queue_, etc.
A program using your library may get confused, depending on the order the libraries are included, see #1.
As far as the principles go, the linker (on Linux, anyway) will look for symbols, and there's an ordering to how those symbols will be found. For more information, you can check out the man page for dlsym(), and specifically for RTLD_NEXT.
Function names in C are global. If two functions in a program have the same name, the program should fail to compile. (Well, sometimes it fails at link time, but the idea still holds.)
Generally, you get around this problem by using some sort of prefix or suffix on the function names in your library. "apporc_stack_init()" is much less likely to collide with something than "init()" is.
I hear this a lot of times that: "inline functions in C expose internal data structures" and that is one of the reasons some people do not like them.
Can someone please explain, how?
Thanks in advance.
Lets say I have a program code.c and a function func(). I can 1) make func() inline - which will expose whatever I do with my data-structures in code.c 2) I can put func() in a library and provide that as a shared lib (which is not readable - I guess ?? :p) ---- Is this a correct analysis?
Since you put inline function definitions in a header file (unless used in a single cpp file), which would need to be included by consumers then I guess you are exposing the inner workings of your code.
But, since the alternative is usually macros, I doubt that is a good reason against them.
It would certainly be more transparent compared to something compiled into a library or object module. That's because you can see the source code, and therefore write code which manipulates the data structures any way you want.
However, for non-line functions for which you have source, I am at a loss how that could be more protected.
There are software corporations which jealously guard their software source code, and only release object modules to be linked with, or shared libraries, or (dread!) .DLLs.
Inline methods expand all method calls in place. So instead of having foo() be a JMP or CALL instruction it just copies the actual instructions of foo() where it was called. If this contains critical data then that would become exposed although inline functions are typically used for short one to two line methods or larger expressions.
Is there a reason why most function definition in device driver in linux code is defined as static? Is there a reason for this?
I was told this is for scoping and to prevent namespace pollution, could anyone explain it in detail why static definition is used in this context?
Functions declared static are not visible outside the translation unit they are defined in (a translation unit is basically a .c file). If a function does not need to be called from outside the file, then it should be made static so as to not pollute the global namespace. This makes conflicts between names that are the same are less likely to happen. Exported symbols are usually indentified with some sort of subsystem tag, which further reduces scope for conflict.
Often, pointers to these functions end up in structs, so they are actually called from outside the file they are defined in, but not by their function name.
For the same reasons you use static in any code. You should only 'publish' your API calls, anything else opens you up to abuse, such as being able to call internal functions from outside the driver, something that would almost certainly be catastrophic.
It's good programming practice to only make visible to the outside world what's necessary. That's what encapsulation is all about.
I concur. This is common and wise practice in any C code - not just kernel code! Don't go thinking this is only appropriate for low level stuff, any C code that stretches past one .c file should have thought given to this.