Prevent external calls to functions inside lib file - c

Is there a reliable way to prevent external code from calling inner functions of a lib that was compiled from C code?
I would like to deliver a static library with an API header file. The library has different modules, consisting of .c and .h files. I would like to prevent the recepients from using functions declared in the inner .h files.
Is this possible?
Thanks!

Is there a reliable way to prevent external code from calling inner functions of a lib ?
No, there cannot be (read about Rice's theorem; detecting statically such non-trivial properties is undecidable). The library or the code might use function pointers. A malicious user could play with function pointers and pointer arithmetic to call some private function (perhaps after having reverse-engineered your code), even if it is static.
On Linux you might play with visibility tricks.
Or you could organize your library as a single translation unit (a bit like sqlite is doing its amalgamation) and have all internal functions be static ...
In general, the library should have naming conventions about its internal functions (e.g. suffix all of them with _). This could be practically helpful (but not against malicious users).
Most importantly, a library should be well documented (with naming conventions being also documented), and a serious user will only use documented functions the way they are documented to be useful.
(so I don't think you should care about internal functions being called; you do need to document which public functions can be called, and how, and when...; a user calling anything else should expect undefined behavior, that is very bad things)
I would like to deliver a static library with an APIheader file, and would like to prevent the recepients from using the structs I define and the inner functions.
I am not sure that (at least on Linux) delivering a static library is wise. I would recommend delivering a shared library, read Drepper's How to Write Shared Libraries.
And you can't prevent the recipient (assuming a malicious, clever, and determined one) to use inner functions and internal struct-s. You should just discourage them in the documentation, and document well your public functions and data types.
I would like to prevent the recepients from using functions declared in the inner .h files. Is this possible?
No, that is impossible.
It looks like you seek a technical solution to a social issue. You need to trust your users (and they need to trust you), so you should document what functions can be used (and you could even add in your documentation some sentence saying that using directly any undocumented function yields undefined behavior). You can't do much more. Perhaps (in particular if you are selling your library as a proprietary software) you need a lawyer to write a good contract.
You might consider writing your own GCC plugin (or GCC MELT extension) to detect such calls. That could take you weeks of work and is not worth the trouble (and will remain imperfect).
I am not able to guess your motivations and use case (is it some life-critical software driving a nuclear reactor, a medical drug injector, an autonomous vehicule, a missile?). Please explain what would happen to you if some (malicious but clever) user would call an internal undocumented function. And what could happen to that user?

Related

Is there a way to limit the use of a function to its library in C?

So I'm working on a static C library (like a library.a file) for a school project. There are multiple functions, some of which are placed in different files, so I can't use the static keyword. Is there a way that those functions could be limited to the library itself, an equivalent to static for libraries?
So I'm working on a static C library (like a library.a file) for a school project. There are multiple functions, some of which are placed in different files, so I can't use the static keyword. Is there a way that those functions could be limited to the library itself, an equivalent to static for libraries?
The C language does not have any formal sense of a unit of program organization larger than a single translation unit but smaller than a whole program. Libraries are a foreign concept to the language specification, provided by substantially all toolchains, but not part of the language itself. Thus, no, the C language does not define a mechanism other than static to declare that a function identifier can be referenced only by a proper subset of all translation units contributing to a program.
Such limitations are supported by some shared library formats, such as ELF, and it is common for C implementations targeting such shared libraries to provide extensions that enable those facilities to be engaged, but the same generally is not true for static libraries.
Note also that in all these cases, we're talking about the linkage of function identifiers, not about genuinely controlling access to the functions. In principle, any function in the program can be called from anywhere in the program via a function pointer pointing to it.
Frame challenge: why do you care?
The usual accommodation for having functions with external linkage that you don't want library clients to call directly would be to omit those functions from the library's public header files. What does it matter if some intrepid person can analyze the library to discover and possibly call those functions? Your public headers and documentation tell people how the library should be used. If people use it in other ways then that's on them.
It may not be possible to completely "hide" the existence of a set of restricted functions from other source files if they are defined with external linkage, since their identifiers will be visible when linking (as noted in the other answer).
However, if you are only looking to prevent someone from inadvertently calling restricted functions, one of these approaches may be useful:
In some of my projects, I have used #define and #ifdef statements to block restricted functions from being used throughout the program. For example, in my Hardware Abstraction Layer (HAL) library C source files, I typically place #define HAL__ prior to any #include statements. Then I place an #ifdef HAL__ ... #endif block around any restricted function definitions in my header files. Someone with intention could easily easily bypass this by adding #define HAL__ to their source code or by modifying the header file, but it provides some protection against unintentional use of restricted functions and other definitions.
Place the restricted function definitions in a separate header file used to build the library itself (for example library.a) and provide only header files containing non-restricted function declarations with the library. The identifiers for any functions defined will still be visible to the linker, but without the prototypes, it will be difficult for anyone to call them.
Again, if having the identifiers for any restricted functions visible throughout the program would be a problem (for example, duplicating other identifiers), then these options will not work. Also, if the goal is to prevent developers from intentionally calling restricted functions, then these options will not work, although option 2 would make this more difficult. If the intention is only to prevent unintentional calls to the restricted functions and there is no concern with having duplicate identifiers in the program, then these options may help.

How to give name to the functions of a library header in C

I am trying to write a generic library in pure c , just some data structures like stack, queue...
In my stack.h when giving name to those functions. I have questions about that.
Can I use such name, for example "init" as the function name to init a stack. Will there be something wrong?
I know maybe there exist other functions which just do other things and have the same name as "init". Then would the program be confused, especially when i both include the different init's headers.
3.I know my worry may be unnecessary, but i still want to know the principle.
Any help is appreciated, thanks.
Can I use such name, for example "init" as the function name to init a
stack. Will there be something wrong?
Yes, if anyone else wants a function named init.
I know my worry may be unnecessary, but i still want to know the
principle
Your worry is necessary, this (the lack of namespaces) is a serious problem in C.
Export as few functions as possible. Make everything static if you can
Prefix function names with something. For instance, instead of init, try stack_init
You don't have namespaces in C so usually you prefix every identifier with the name or nickname of your library.
init();
becomes
fancy_lib_init();
There might be existing libraries doing what you want (e.g. Glib). At least, study them a little before writing your own.
If you claim to develop a generic reusable C library, I suggest having naming conventions. For instance, have all the identifiers (notably function names, typedef-s, struct names...) share some common prefix.
Be systematic in your naming conventions. For instance, initializers for stacks and for queues should have similar names & signatures, and end with _init. Document your naming conventions.
Define very clearly how should data be allocated and released. Who and when should call free?
init() might be okay (if you're including your library into something else as an actual library, rather than compiling its source in), but it's better practice to use something like stack_init(), and to prefix your library's functions with stack_ or queue_, etc.
A program using your library may get confused, depending on the order the libraries are included, see #1.
As far as the principles go, the linker (on Linux, anyway) will look for symbols, and there's an ordering to how those symbols will be found. For more information, you can check out the man page for dlsym(), and specifically for RTLD_NEXT.
Function names in C are global. If two functions in a program have the same name, the program should fail to compile. (Well, sometimes it fails at link time, but the idea still holds.)
Generally, you get around this problem by using some sort of prefix or suffix on the function names in your library. "apporc_stack_init()" is much less likely to collide with something than "init()" is.

how does inline functions expose internal data structures?

I hear this a lot of times that: "inline functions in C expose internal data structures" and that is one of the reasons some people do not like them.
Can someone please explain, how?
Thanks in advance.
Lets say I have a program code.c and a function func(). I can 1) make func() inline - which will expose whatever I do with my data-structures in code.c 2) I can put func() in a library and provide that as a shared lib (which is not readable - I guess ?? :p) ---- Is this a correct analysis?
Since you put inline function definitions in a header file (unless used in a single cpp file), which would need to be included by consumers then I guess you are exposing the inner workings of your code.
But, since the alternative is usually macros, I doubt that is a good reason against them.
It would certainly be more transparent compared to something compiled into a library or object module. That's because you can see the source code, and therefore write code which manipulates the data structures any way you want.
However, for non-line functions for which you have source, I am at a loss how that could be more protected.
There are software corporations which jealously guard their software source code, and only release object modules to be linked with, or shared libraries, or (dread!) .DLLs.
Inline methods expand all method calls in place. So instead of having foo() be a JMP or CALL instruction it just copies the actual instructions of foo() where it was called. If this contains critical data then that would become exposed although inline functions are typically used for short one to two line methods or larger expressions.

How to catch unintentional function interpositioning?

Reading through my book Expert C Programming, I came across the chapter on function interpositioning and how it can lead to some serious hard to find bugs if done unintentionally.
The example given in the book is the following:
my_source.c
mktemp() { ... }
main() {
mktemp();
getwd();
}
libc
mktemp(){ ... }
getwd(){ ...; mktemp(); ... }
According to the book, what happens in main() is that mktemp() (a standard C library function) is interposed by the implementation in my_source.c. Although having main() call my implementation of mktemp() is intended behavior, having getwd() (another C library function) also call my implementation of mktemp() is not.
Apparently, this example was a real life bug that existed in SunOS 4.0.3's version of lpr. The book goes on to explain the fix was to add the keyword static to the definition of mktemp() in my_source.c; although changing the name altogether should have fixed this problem as well.
This chapter leaves me with some unresolved questions that I hope you guys could answer:
Does GCC have a way to warn about function interposition? We certainly don't ever intend on this happening and I'd like to know about it if it does.
Should our software group adopt the practice of putting the keyword static in front of all functions that we don't want to be exposed?
Can interposition happen with functions introduced by static libraries?
Thanks for the help.
EDIT
I should note that my question is not just aimed at interposing over standard C library functions, but also functions contained in other libraries, perhaps 3rd party, perhaps ones created in-house. Essentially, I want to catch any instance of interpositioning regardless of where the interposed function resides.
This is really a linker issue.
When you compile a bunch of C source files the compiler will create an object file for each one. Each .o file will contain a list of the public functions in this module, plus a list of functions that are called by code in the module, but are not actually defined there i.e. functions that this module is expecting some library to provide.
When you link a bunch of .o files together to make an executable the linker must resolve all of these missing references. This is the point where interposing can happen. If there are unresolved references to a function called "mktemp" and several libraries provide a public function with that name, which version should it use? There's no easy answer to this and yes odd things can happen if the wrong one is chosen
So yes, it's a good idea in C to "static" everything unless you really do need to use it from other source files. In fact in many other languages this is the default behavior and you have to mark things "public" if you want them accessible from outside.
It sounds like what you want is for the tools to detect that there are name conflicts in functions - ie., you don't want your externally accessible function names form accidentally having the same name and therefore 'override' or hide functions with the same name in a library.
There was a recent SO question related to this problem: Linking Libraries with Duplicate Class Names using GCC
Using the --whole-archive option on all the libraries you link against may help (but as I mentioned in the answer over there, I really don't know how well this works or how easy it is to convince builds to apply the option to all libraries)
Purely formally, the interpositioning you describe is a straightforward violation of C language definition rules (ODR rule, in C++ parlance). Any decent compiler must either detect these situations, or provide options for detecting them. It is simply illegal to define more than one function with the same name in C language, regardless of where these functions are defined (Standard library, other user library etc.)
I understand that many platforms provide means to customize the [standard] library behavior by defining some standard functions as weak symbols. While this is indeed a useful feature, I believe the compilers must still provide the user with means to enforce the standard diagnostics (on per-function or per-library basis preferably).
So, again, you should not worry about interpositioning if you have no weak symbols in your libraries. If you do (or if you suspect that you do), you have to consult your compiler documentation to find out if it offers you with means to inspect the weak symbol resolution.
In GCC, for example, you can disable the weak symbol functionality by using -fno-weak, but this basically kills everything related to weak symbols, which is not always desirable.
If the function does not need to be accessed outside of the C file it lives in then yes, I would recommend making the function static.
One thing you can do to help catch this is to use an editor that has configurable syntax highlighting. I personally use SciTE, and I have configured it to display all standard library function names in red. That way, it's easy to spot if I am re-using a name I shouldn't be using (nothing is enforced by the compiler, though).
It's relatively easy to write a script that runs nm -o on all your .o files and your libraries and checks to see if an external name is defined both in your program and in a library. Just one of the many sane sensible services that the Unix linker doesn't provide because it's stuck in 1974, looking at one file at a time. (Try putting libraries in the wrong order and see if you get a useful error message!)
The Interposistioning occurs when the linker is trying to link separate modules.
It cannot occur within a module. If there are duplicate symbols in a module the linker will report this as an error.
For *nix linkers, unintended Interposistioning is a problem and it is difficult for the linker to guard against it.
For the purposes of this answer consider the two linking stages:
The linker links translation units into modulles (basically
applications or libraries).
The linker links any remaining unfound symbols by searching in modules.
Consider the scenario described in 'Expert C programming' and in SiegeX's question.
The linker fist tries to build the application module.
It sess that the symbol mktemp() is an external and tries to find a funcion definiton for the symbol. The linker finds
the definition for the function in the object code of the application module and marks the symbol as found.
At this stage the symbol mktemp() is completely resolved. It is not considered in any way tentative so as to allow
for the possibility that the anothere module might define the symbol.
In many ways this makes sense, since the linker should first try and resolve external symbols within the module it is
currently linking. It is only unfound symbols that it searches for when linking in other modules.
Furthermore, since the symbol has been marked as resolved, the linker will use the applications mktemp() in any
other cases where is needs to resolve this symbol.
Thus the applications version of mktemp() will be used by the library.
A simple way to guard agains the problem is to try and make all external sysmbols in your application or library unique.
For modules that are only going to shared on a limited basis, this can fairly easily be done by making sure all
extenal symbols in your module are unique by appending a unique identifier.
For modules that are widely shared making up unique names is a problem.

What methods are there to modularize C code?

What methods, practices and conventions do you know of to modularize C code as a project grows in size?
Create header files which contain ONLY what is necessary to use a module. In the corresponding .c file(s), make anything not meant to be visible outside (e.g. helper functions) static. Use prefixes on the names of everything externally visible to help avoid namespace collisions. (If a module spans multiple files, things become harder., as you may need to expose internal things and not be able hide them with "static")
(If I were to try to improve C, one thing I would do is make "static" the default scoping of functions. If you wanted something visible outside, you'd have to mark it with "export" or "global" or something similar.)
OO techniques can be applied to C code, they just require more discipline.
Use opaque handles to operate on objects. One good example of how this is done is the stdio library -- everything is organised around the opaque FILE* handle. Many successful libraries are organised around this principle (e.g. zlib, apr)
Because all members of structs are implicitly public in C, you need a convention + programmer discipline to enforce the useful technique of information hiding. Pick a simple, automatically checkable convention such as "private members end with '_'".
Interfaces can be implemented using arrays of pointers to functions. Certainly this requires more work than in languages like C++ that provide in-language support, but it can nevertheless be done in C.
The High and Low-Level C article contains a lot of good tips. Especially, take a look at the "Classes and objects" section.
Standards and Style for Coding in ANSI C also contains good advice of which you can pick and choose.
Don't define variables in header files; instead, define the variable in the source file and add an extern statement (declaration) in the header. This will tie into #2 and #3.
Use an include guard on every header. This will save so many headaches.
Assuming you've done #1 and #2, include everything you need (but only what you need) for a certain file in that file. Don't depend on the order of how the compiler expands your include directives.
The approach that Pidgin (formerly Gaim) uses is they created a Plugin struct. Each plugin populates a struct with callbacks for initialization and teardown, along with a bunch of other descriptive information. Pretty much everything except the struct is declared as static, so only the Plugin struct is exposed for linking.
Then, to handle loose coupling of the plugin communicating with the rest of the app (since it'd be nice if it did something between setup and teardown), they have a signaling system. Plugins can register callbacks to be called when specific signals (not standard C signals, but a custom extensible kind [identified by string, rather than set codes]) are issued by any part of the app (including another plugin). They can also issue signals themselves.
This seems to work well in practice - different plugins can build upon each other, but the coupling is fairly loose - no direct invocation of functions, everything's through the signaling stystem.
A function should do one thing and do this one thing well.
Lots of little function used by bigger wrapper functions help to structure code from small, easy to understand (and test!) building blocks.
Create small modules with a couple of functions each. Only expose what you must, keep anything else static inside of the module. Link small modules together with their .h interface files.
Provide Getter and Setter functions for access to static file scope variables in your module. That way, the variables are only actually written to in one place. This helps also tracing access to these static variables using a breakpoint in the function and the call stack.
One important rule when designing modular code is: Don't try to optimize unless you have to. Lots of small functions usually yield cleaner, well structured code and the additional function call overhead might be worth it.
I always try to keep variables at their narrowest scope, also within functions. For example, indices of for loops usually can be kept at block scope and don't need to be exposed at the entire function level. C is not as flexible as C++ with the "define it where you use it" but it's workable.
Breaking the code up into libraries of related functions is one way of keeping things organized. To avoid name conflicts you can also use prefixes to allow you to reuse function names, though with good names I've never really found this to be much of a problem. For example, if you wanted to develop your own math routines but still use some from the standard math library, you could prefix yours with some string: xyz_sin(), xyz_cos().
Generally I prefer the one function (or set of closely related functions) per file and one header file per source file convention. Breaking files into directories, where each directory represents a separate library is also a good idea. You'd generally have a system of makefiles or build files that would allow you to build all or part of the entire system following the hierarchy representing the various libraries/programs.
There are directories and files, but no namespaces or encapsulation. You can compile each module to a separate obj file, and link them together (as libraries).

Resources