Shouldn't be hard, right? Right?
I am currently trawling the OpenAFS codebase to find the header definition of pioctl. I've thrown everything I've got at it: checked ctags, grepped the source code for pioctl, etc. The closest I've got to a lead is the fact that there's a file pioctl_nt.h that contains the definition, except it's not actually what I want because none of the userspace code directly includes it, and it's Windows specific.
Now, I'm not expecting you to go and download the OpenAFS codebase and find the header file for me. I am curious, though: what are your techniques for finding the header file you need when everything else fails? What are the worst case scenarios that could cause a grep for pioctl in the codebase to not actually come up with anything that looks like a function definition?
I should also note that I have access to two independent userspace programs that have done it properly, so in theory I could do an O(n) search for the function. But none of the header files pop out to me, and n is large...
Edit: The immediate issue has been resolved: pioctl() is defined implicitly, as shown by this:
AFS.xs:2796: error: implicit declaration of function ‘pioctl’
If grep -r and ctags are failing, then it's probably being defined as the result of some nasty macro(s). You can try making the simplest possible file that calls pioctl() and compiles successfully, and then preprocessing it to see what happens:
gcc -E test.c -o test.i
grep pioctl -C10 test.i
There are compiler options to show the preprocessor output. Try those? In a horrible pinch where my head was completely empty of any possible definition the -E option (in most c compilers) does nothing but spew out the the preprocessed code.
Per requested information: Normally I just capture a compile of the file in question as it is output on the screen do a quick copy and paste and put the -E right after the compiler invocation. The result will spew preprocessor output to the screen so redirect it to a file. Look through that file as all of the macros and silly things are already taken care of.
Worst case scenarios:
K&R style prototypes
Macros are hiding the definition
Implicit Declaration (per your answer)
Have you considered using cscope (available from SourceForge)?
I use it on some fairly significant code sets (25,000+ files, ranging up to about 20,000 lines in a file) with good success. It takes a while to derive the file list (5-10 minutes) and longer (20-30 minutes) to build the cross-reference on an ancient Sun E450, but I find the results useful.
On an almost equally ancient Mac (dual 1GHz PPC 32-bit processors), cscope run on the OpenAFS (1.5.59) source code comes up with quite a lot of places where the function is declared, sometimes inline in code, sometimes in headers. It took a few minutes to scan the 4949 files, generating a 58 MB cscope.out file.
openafs-1.5.59/src/sys/sys_prototypes.h
openafs-1.5.59/src/aklog/aklog_main.c (along with comment "Why doesn't AFS provide these prototypes?")
openafs-1.5.59/src/sys/pioctl_nt.h
openafs-1.5.59/src/auth/ktc.c includes a define for PIOCTL
openafs-1.5.59/src/sys/pioctl_nt.c provides an implementation of it
openafs-1.5.59/src/sys/rmtsysc.c provides an implementation of it (and sometimes afs_pioctl() instead)
The rest of the 184 instances found seem to be uses of the function, or documentation references, or release notes, change logs, and the like.
The current working theory that we've decided on, after poking at the preprocessor and not finding anything either, is that OpenAFS is letting the compiler infer the prototype of the function, since it returns an integer and takes pointer, integer, pointer, integer as its parameters. I'll be dealing with this by merely defining it myself.
Edit: Excellent! I've found the smoking gun:
AFS.xs:2796: error: implicit declaration of function ‘pioctl’
While the original general question has been answered, if anyone arrives at this page wondering where to find a header file that defines pioctl:
In current releases of OpenAFS (1.6.7), a protoype for pioctl is defined in sys_prototypes.h. But that the time that this question was originally asked, that file did not exist, and there was no prototype for pioctl visible from outside the OpenAFS code tree.
However, most users of pioctl probably want, or are at least okay with using, lpioctl ("local" pioctl), which always issues a syscall on the local machine. There is a prototype for this in afssyscalls.h (and these days, also sys_prototypes.h).
The easiest option these days, though, is just to use libkopenafs. For that, include kopenafs.h, use the function k_pioctl, and link against -lkopenafs. That tends to be a much more convenient interface than trying to link with OpenAFS libsys and other stuff.
Doesn't it usually say in the man page synopsis?
Related
I'm trying to learn the glibc source, and I've found navigation to be quite formidable. I'm not referring to the code itself, but simply finding it: it seems to be a maze of macros and wrappers, such that just finding the actual code I want is quite tough.
Not only system dependent things, like setjmp, but even portable functions like fprintf: in each case, I struggle to find the true definition in C. It's easy to find the start, but it's usually an empty shell wrapping defines and macros.
This feels like a modern day equivalent of goto statements, with the spaghetti problem all over again.
How can I navigate glibc and find the actual implementation for lib functions?
Update
As an example, try looking up the definition of hidden_def in glibc. It's a macro taking you to hidden_def1, which is a macro taking you to hidden_def2, which is a macro taking you to hidden_asm, which is a macro taking you to hidden_asm1, at which point...
Moreover, each of these macros is defined in several different files, with other #defines controlling which definition is actually invoked.
This is not unusual: it seems to be de rigeur throughout the source code. How does anyone follow it? How do the GNU developers follow it?
How can I navigate glibc and find the actual implementation for lib functions?
Type the function in the search bar at https://code.woboq.org/userspace/glibc or at https://github.com/bminor/glibc . Navigate the results manually until you find the definition.
If you want to index the project locally, use cscope, ctags, GLOBAL tags or clangd to index the project and then use that tools specific interface to search for the definition.
As an example, try looking up the definition of hidden_def in glibc
Type hidden_def glibc into google. My first hit is woboq.org https://code.woboq.org/userspace/glibc/include/libc-symbols.h.html#550 .
I use firefox. I type ctrl+f and type hidden_def. Type Enter until I find # define hidden_def at https://code.woboq.org/userspace/glibc/include/libc-symbols.h.html#550 .
Then select __hidden_ver1 and type ctrl+c ctrl+f and ctrl+v and search for it. In the web browser. I type enter until I find https://code.woboq.org/userspace/glibc/include/libc-symbols.h.html#540 . __hidden_ver2 is just below on line 542.
For most cases all you need is a browser, google, coboq.org and github.org.
It's a macro taking you to hidden_def1
There are no such macros as you mentioned, at least at the version hosted at woboq.org.
How does anyone follow it?
While IDE is a powerful help, each project is unique and requires different settings, that take time to figure out. Mostly browsing the source code is grep (or faster alternatives, like ag, very useful for big projects like glibc) and going through the result list.
Not only system dependent things, like setjmp
Developers are (shoudl be :) sane people - in most cases a function named setjmp will be in a file named setjmp.c. or setjmp.S. Or in the same directory as setjmp.h. Or inside directory named stdlib or setjmp.
Type setjmp in github search bar. https://github.com/bminor/glibc/search?q=setjmp You see there are multiple definitions for each architecture powerpc s390 etc. But files are all named setjmp. Go back. Type "Go to file" on https://github.com/bminor/glibc . Search for a file named x86/setjmp. There are 3 implementations, the most standard one seems to be https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/setjmp.S .
even portable functions like fprintf
As above, saerch for file named fprintf . You quickly find https://github.com/bminor/glibc/blob/master/stdio-common/fprintf.c .
Background:
In a particular project there are about couple of thousand functions in more than hundred files. The functions are divided to reside in two banks of code memory - fast_mem and slow_mem. But now, since the fast_mem area is limited, its running out of space to accommodate any new code changes.
As part of code review, its been found that some functions in fast_mem have no callers. But the list of functions is too huge to check them one by one manually.
Question:
So, coming to the question, is there a tool that can list the callers of all the functions in the project? With this, I can go ahead and remove functions in fast_mem that don't have any callers.
I use cscope for code browsing along with ctags. But this requires one to input the function name manually. Can this be automated some how to get the complete list?
I also tried Doxygen with its caller graph feature. The result is not so comfortable to use though.
I use Scientific Toolworks Understand
If your compiler is a recent GCC (or if you can switch to GCC 4.6, possibly as a cross-compiler) you might develop a GCC plugin or a MELT extension to find out.
Of course, if you are e.g. doing tricks with function pointers (e.g. unportable pointer arithmetic on function pointers) the original question is undecidable.
Actually, if you are using function pointers, often the only reasonable thing to say is that they can reach only functions of the same signature.
And perhaps the project is important enough so that customizing the compiler to make a better (automatic or semi-automatic) trade-off between fast_mem & slow_mem is worthwhile. This is typically an excellent case for GCC plugins or MELT extensions (but that take some work -days or weeks, not hours-, because you need to understand the internal GCC representations to customize GCC), and you are probably the only one who could do it (because your question is very peculiar to some strange systems).
Let's assume there aren't any odd function pointer games going on. Then you can break out the under-used cflow:
http://www.gnu.org/software/cflow/
Generate a "reverse index" with the -r flag. you'll get a list of every function, followed by where it's called. You can feed it multiple files.
You can use static code analysis tool like cppcheck.
If you call it with --enable=unusedFunction parameter it will warn about unused function.
Is there a way to programmatically check if a single C source file is potentially harmful?
I know that no check will yield 100% accuracy -- but am interested at least to do some basic checks that will raise a red flag if some expressions / keywords are found. Any ideas of what to look for?
Note: the files I will be inspecting are relatively small in size (few 100s of lines at most), implementing numerical analysis functions that all operate in memory. No external libraries (except math.h) shall be used in the code. Also, no I/O should be used (functions will be run with in-memory arrays).
Given the above, are there some programmatic checks I could do to at least try to detect harmful code?
Note: since I don't expect any I/O, if the code does I/O -- it is considered harmful.
Yes, there are programmatic ways to detect the conditions that concern you.
It seems to me you ideally want a static analysis tool to verify that the preprocessed version of the code:
Doesn't call any functions except those it defines and non I/O functions in the standard library,
Doesn't do any bad stuff with pointers.
By preprocessing, you get rid of the problem of detecting macros, possibly-bad-macro content, and actual use of macros. Besides, you don't want to wade through all the macro definitions in standard C headers; they'll hurt your soul because of all the historical cruft they contain.
If the code only calls its own functions and trusted functions in the standard library, it isn't calling anything nasty. (Note: It might be calling some function through a pointer, so this check either requires a function-points-to analysis or the agreement that indirect function calls are verboten, which is actually probably reasonable for code doing numerical analysis).
The purpose of checking for bad stuff with pointers is so that it doesn't abuse pointers to manufacture nasty code and pass control to it. This first means, "no casts to pointers from ints" because you don't know where the int has been :-}
For the who-does-it-call check, you need to parse the code and name/type resolve every symbol, and then check call sites to see where they go. If you allow pointers/function pointers, you'll need a full points-to analysis.
One of the standard static analyzer tool companies (Coverity, Klocwork) likely provide some kind of method of restricting what functions a code block may call. If that doesn't work, you'll have to fall back on more general analysis machinery like our DMS Software Reengineering Toolkit
with its C Front End. DMS provides customizable machinery to build arbitrary static analyzers, for the a language description provided to it as a front end. DMS can be configured to do exactly the test 1) including the preprocessing step; it also has full points-to, and function-points-to analyzers that could be used to the points-to checking.
For 2) "doesn't use pointers maliciously", again the standard static analysis tool companies provide some pointer checking. However, here they have a much harder problem because they are statically trying to reason about a Turing machine. Their solution is either miss cases or report false positives. Our CheckPointer tool is a dynamic analysis, that is, it watches the code as it runs and if there is any attempt to misuse a pointer CheckPointer will report the offending location immediately. Oh, yes, CheckPointer outlaws casts from ints to pointers :-} So CheckPointer won't provide a static diagnostic "this code can cheat", but you will get a diagnostic if it actually attempts to cheat. CheckPointer has rather high overhead (all that checking costs something) so you probably want to run you code with it for awhile to gain some faith that nothing bad is going to happen, and then stop using it.
EDIT: Another poster says There's not a lot you can do about buffer overwrites for statically defined buffers. CheckPointer will do those tests and more.
If you want to make sure it's not calling anything not allowed, then compile the piece of code and examine what it's linking to (say via nm). Since you're hung up on doing this by a "programmatic" method, just use python/perl/bash to compile then scan the name list of the object file.
There's not a lot you can do about buffer overwrites for statically defined buffers, but you could link against an electric-fence type memory allocator to prevent dynamically allocated buffer overruns.
You could also compile and link the C-file in question against a driver which would feed it typical data while running under valgrind which could help detect poorly or maliciously written code.
In the end, however, you're always going to run up against the "does this routine terminate" question, which is famous for being undecidable. A practical way around this would be to compile your program and run it from a driver which would alarm-out after a set period of reasonable time.
EDIT: Example showing use of nm:
Create a C snippet defining function foo which calls fopen:
#include <stdio.h>
foo() {
FILE *fp = fopen("/etc/passwd", "r");
}
Compile with -c, and then look at the resulting object file:
$ gcc -c foo.c
$ nm foo.o
0000000000000000 T foo
U fopen
Here you'll see that there are two symbols in the foo.o object file. One is defined, foo, the name of the subroutine we wrote. And one is undefined, fopen, which will be linked to its definition when the object file is linked together with the other C-files and necessary libraries. Using this method, you can see immediately if the compiled object is referencing anything outside of its own definition, and by your rules, can considered to be "bad".
You could do some obvious checks for "bad" function calls like network IO or assembly blocks. Beyond that, I can't think of anything you can do with just a C file.
Given the nature of C you're just about going to have to compile to even get started. Macros and such make static analysis of C code pretty difficult.
i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)
As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)
Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.
If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.
I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).
I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.
I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)
I'm using Doxygen with some embedded C source. Given a .c/.h file pair, do you put Doxygen comments on the function prototype (.h file) or the function definition (.c file), or do you duplicate them in both places?
I'm having a problem in which Doxygen is warning about missing comments when I document in one place but not the other; is this expected, or is my Doxygen screwed up?
For public APIs I document at the declaration, as this is where the user usually looks first if not using the doxygen output.
I never had problems with only documenting on one place only, but I used it with C++; could be different with C, although I doubt it.
[edit] Never write it twice. Never. In-Source documentation follows DRY, too, especially concerning such copy-and-paste perversions.[/edit]
However, you can specify whether you want warnings for undocumented elements. Although such warnings look nice in theory, my experience is that they quickly are more of a burden than a help. Documenting all functions usually is not the way to go (there is such a thing is redundant documentation, or even hindering documentation, and especially too much documentation); good documentation needs a knowledgeable person spending time with it. Given that, those warnings are unnecessary.
And if you do not have the resources for writing good documentation (money, time, whatever...), than those warnings won't help either.
Quoted from my answer to this question: C/C++ Header file documentation:
I put documentation for the interface
(parameters, return value, what the
function does) in the interface file
(.h), and the documentation for the
implementation (how the function
does) in the implementation file (.c,
.cpp, .m). I write an overview of the
class just before its declaration, so
the reader has immediate basic
information.
With Doxygen, this means that documentation describing overview, parameters and return values (\brief, \param, \return) are used for documenting function prototype and inline documentation (\details) is used for documenting function body (you can also refer to my answer to that question: How to be able to extract comments from inside a function in doxygen?)
I often use Doxygen with C targeting embedded systems. I try to write documentation for any single object in one place only, because duplication will result in confusion later. Doxygen does some amount of merging of the docs, so in principle it is possible to document the public API in the .h file, and to have some notes on how it actually works sprinkled in the .c file. I've tried not to do that myself.
If moving the docs from one place to the other changes the amount of warnings it produces, that may be a hint that there may be something subtly different between the declaration and definition. Does the code compile clean with -Wall -Wextra for example? Are there macros that mutate the code in one place and not the other? Of course, Doxygen's parser is not a full language parser, and it is possible to get it confused as well.
We comment only the function definitions, but we use it with C++.
Write it at both places is wasting time.
About the warning, if your documentation looks good, maybe it's a good way to ignore such warnings.
I've asked myself the same question and was pleasantly surprised to see that Doxygen actually includes the same in-line documentation that is in the .c file in the corresponding .h file when browsing the generated html documentation. Hence you don't have to repeat your in-line documentation, and Doxygen is smart enough to include it in both places!
I'm running version Doxygen version 1.8.10.