Generate documentation for 2 languages with same code - c

Can I somehow generate documentation for 2 different languages with same code? Problem is that I have a C API that is also exposed trough a proprietary language that resembles VB.
So the exposed function in C is something like:
int Function (*PointerToObject)
While in VB it would be something like:
int Function (ByVal long PointerToObject)
I already opened another thread before about this same problem, but by that time I did not know anything about Doxygen. The last couple of days I have been reading the documentation and apparently it can create documentation for VB, but I have to have actual VB code for it to work and I don't. The only thing I have is the original C and the swig output also in C.
What I have in mind is some tool (doxygen, sphinx,...) that would enable me to create some kind of multi-language documentation from a single source (not valid doxygen, but explains the idea):
/*! \fn int Function(*PointerToObject)
* \brief A simple function.
* \Cparam[in] PointerToObject A pointer to the object being used.
* \VBparam[in] ByVal long PointerToObject A pointer to the object being used.
* \return An integer.
*/
It would be great if I could somehow integrate it to swig since it is swig that identifies the correct VB type, but I guess I maybe be asking too much.
It is a bit complicated, if I am not clear enough please leave a comment I will try to explain it further.

I'm not positive how useful this will be as I haven't done precisely what you're looking for (and it is a bit of a kludge), but under similar circumstances I came to the conclusion that our best bet was to generate a fluff object just for doxygen to document.
In our case we have an LDMud which has a few hundred driver-published external functions which don't exist in the LPC language the rest of the MUD is written in. We could parse it in its native C code as a separate documentation run and include the docs with our primary documentation. In our case we have fairly thorough plain-text documentation including what we should consider the definition of these functions to be, so we use this documentation instead to generate an object with dummy function definitions and parse the plaintext documentation into function-head comments in doxygen style. This obviously doesn't provide us the advantages of including code or in-code comments with the documentation, but it does allow us to generate the documentation in multiple forms, push it around in guides, make references to these functions from other documentation, etc.
I'm not directly familiar with C so I'm not sure if there's any introspective support for performing a programmatic inventory of the functions you need and generating dummy declarations from that information or not. If you don't and have a small number of functions I suspect it would be the best investment of your time to simply write the dummy declarations by hand. If you have a really large number of these functions it may be worth your time to use doxygen to parse the original code into XML documentation, and generate the dummy object(s) from the XML. Doxygen's XML can be a pain to deal with, but if all you really need is the declaration and arguments it is (relatively) simple.
As a final aside, if your documentation can really be thought of as internal and external, you might also want to use two or more configuration files to generate different sets of documentation saved to different locations and publish them separately to your internal/external audiences. It will make the task of providing the right amount of detail to each less frustrating. Along those lines you may also find the INTERNAL_DOCS option and the #internal / #endinternal commands useful.

Related

what is the purpose of UVM automation macro?

I'm trying to understand about UVM automation macro.
among other things, i found some sentence "UVM system Verilog call library also includes macros that automatically implement the print, copy, clone, compare, pack and unpack methods and more" from text.
and I found that lots of example used with the following usage.
For example,
....
uvm_object_utils_begin(apb_transfer)
'uvm_field_int(addr, UVM_DEFAULT)
'uvm_field_int(data, UVM_DEFAULT)
...
uvm_object_utils_end
but I didn't get it. that usage of 'uvm_field_int() is just defining the variable not copy, clone, compare....
How do I understand what uvm automation macro to do?
even I also curious about why those things are named as a automation? I can't find any something auto kind of thing.
As you say, the UVM field automation macros generate a number of class utility methods such as copy, print and clone that include the registered fields. That's it. I guess the name "automation" is used, because they automatically write code so you don't have to.
My company (Doulos) recommends you don't use these macros useless you know what you're doing. This is because they can be hard to debug, can generate more code than you need and have a strange side effect (values can be read from the configuration database automatically without your knowledge). Of course, the advantage of using them is that you get a whole bunch of code for free - the copy, compare,, print methods etc.
The HTML documentation supplied with a UVM download is very good. Search for `uvm_object_utils and follow your nose.

C struct introspection at runtime

Is there a facility for the C language that allows run-time struct introspection?
The context is this:
I've got a daemon that responds to external events, and for each event we carry around an execution context struct (the "context"). The context is big and messy, and contains references to all sorts of state.
Once the event has been handled, I would like to be able to run the context through a filter, and if it matches some set of criteria, drop a log message to help with debugging. However, since I hope to use this for field debugging, I won't know what criteria will be useful to filter on until run time.
My ideal solution would allow the user to, essentially, write a C-style boolean expression and have the program use that. Something like:
activate_filter context.response_time > 4.2 && context.event.event_type == foo_event
Ideas that have been tossed around so far include:
Providing a limited set of fields that we know how to access.
Wrapping all the relevant structs in some sort of macro that generates introspection tools at run time.
Writing a python script that knows where (versioned) headers live, generates C code and compiles it to a dll, which the daemon then loads and uses as a filter. Obviously this approach has some extra security considerations.
Before I start in on some crazy design goose chase, does anyone know of examples of this sort of thing in the wild? I've dome some googling but haven't come up with much.
I would also suggest tackling this issue from another angle. The key words in your question are:
The context is big and messy
And that's where the issue is. Once you clean this up, you'll probably be able to come up with a clean logging facility.
Consider redefining all the fields in your context struct in some easy, pliable format, like XML. A simple `XML schema, that lists all the members of the struct, their types, and maybe some other metadata, even a comment that documents this field.
Then, throw together a quick and dirty stylesheet that reads the XML file and generates a compilable C struct, that your code actually uses. Then, a different stylesheet that cranks out robo-generated code that enumerates each field in the struct, and generates the code to convert each field into a string.
From that, bolting on a logging facility of some kind, with a user-provided filtering string becomes an easier task. You do have to come up with some way of parsing an arbitrary filtering string. Knowledge of lex and yacc would come in handy.
Things of this nature have been done before.
The XCB library is a C client library for the X11 protocol. The protocol defines various kinds of binary messages which are essentially simple structs that the client and the server toss to each other, over a socket. The way that libxcb is implemented, is that all X11 messages and all datatypes inside them are described in an XML definition, and a stylesheet robo-generates C struct definitions, and the code to parse them out, and provide a fairly clean C API to parse and generate X11 messages.
You are probably approaching this problem from a wrong side.
Logging is typically used to facilitate debugging. The program writes all sorts of events to a log file. To extract interesting entries filtering is applied to the log file.
Sometimes a program generates just too much events; logging libraries usually address this issues by offering verbosity control. Basically a logging function takes an additional parameter telling the verbosity level of the current message. If the value is above the globally configured threshold the message gets discarded. Some libraries even allow to control verbosity level on a per-module basis (Ex: google log).
Another possible approach is to leverage the power of a debugger since the debugger has access to all sorts of meta information. One can create a conditional breakpoint testing variables in scope for arbitrary conditions. Once the program stops any information could be extracted from the scope. This can be automated using scripting facilities provided by a debugger (gdb has great ones).
Finally there are tools generating glue code to use C libraries from scripting languages. One example is SWIG. It analyzes a header file and generates code allowing a scripting language to invoke functions, access structure fields, etc.
Your filter expression will become a program in, say, Lua (other scripting languages are supported as well). You invoke this program passing in the pointer to execution context struct (the "context"). Thanks to the accessors generated by SWIG Lua program can examine any field in the structure.
I generated introspection out of SWIG-CSV parser.
Suppose the C code contains structure like the following,
class Bike {
public:
int color; // color of the bike
int gearCount; // number of configurable gear
Bike() {
// bla bla
}
~Bike() {
// bla bla
}
void operate() {
// bla bla
}
};
Then it will generate the following CSV metadata,
Bike|color|int|variable|public|
Bike|gearCount|int|variable|public|
Bike|operate|void|function|public|f().
Now it is easy to parse the CSV file with python or C/C++ if needed.
import csv
with open('bike.csv', 'rb') as csvfile:
bike_metadata = csv.reader(csvfile, delimiter='|')
# do your thing

Spaghetti code visualisation software?

a smoking pile of spaghetti just landed on my desk, and my task is to understand it (so I can refactor / reimplement it).
The code is C, and a mess of global variables, structure types and function calls.
I would like to plot graphs of the code with the information:
- Call graph
- Which struct types are used in which functions
- Which global variable is used in what function
Hopefully this would make it easier to identify connected components, and extract them to separate modules.
I have tried the following software for similar purposes:
- ncc
- ctags
- codeviz / gengraph
- doxygen
- egypt
- cflow
EDIT2:
- frama-c
- snavigator
- Understand
The shortcomings of these are either
a) requires me to be able to compile the code. My code does not compile, since portions of the source code is missing.
b) issues with preprocessor macros (like cflow, who wants to execute both branches of #if statements). Running it through cpp would mess up the line numbers.
c) I for some reason do not manage to get the software to do what I want to do (like doxygen; the documentation for call graph generation is not easy to find, and since it does not seem to plot variables/data types anyway, it is probably not worth spending more time learning about doxygen's config options). EDIT: I did follow a these Doxygen instrcutions, but it did only plot header file dependencies.
I am on Linux, so it is a huge plus if the software is for linux, and free software. Not sure my boss understands the need to buy a visualizer :-(
For example: a command line tool that lists in which functions a symbol (=function,variable,type) is referenced in would be of great help (like addr2line, but for types/variable names/functions and source code).
//T
My vote goes to gnu global. It has all the features of ctags/cscope combined as well as the possibility to generate fully indexed html which allows you to browse the code in your favorite browser. Fire it up in apache and you have a web-service that anyone can access including full search capabilities.
It integrates nicely into emacs/vim/even the bash-shell, and you can use it directly from the shell-prompt.
To see it in action on the linux kernel, visit this
Combine that with a tool for cyclomatic complexity plugin for eclipse which calculates the complexity of your code. besides the cyclomatic complexity it can handle:
McCabe's Cyclomatic Complexity
Efferent Couplings
Lack of Cohesion in Methods
Lines Of Code in Method
Number Of Fields
Number Of Levels
Number Of Locals In Scope
Number Of Parameters
Number Of Statements
Weighted Methods Per Class
...and you should have everything you need.
If you like command line ;) maybe you could try cscope, it does static analysis of code and can tell you where are referenced some symbols/variables/functions... Not the Holy Graal, but it can be pretty usefull to browse unknown source code.
There are also some GUI that can handle csope results (Vi, Emacs, JEdit...).
On the other hand, Eclipse with the CDT plugin can also help you to navigate into the spaghetti code you have to maintain.
It's not free and afaik not linux but cppDepend might be worth evaluating - at least until someone comes up with a more suitable suggestion :)
http://www.cppdepend.com/ [Demo video here]
If you'd like to know in which functions a symbol is declared or referenced you can try LXR. It's not console based, but is quite usable.

How to implement standard C function extraction?

I have a "a pain in the a$$" task to extract/parse all standard C functions that were called in the main() function. Ex: printf, fseek, etc...
Currently, my only plan is to read each line inside the main() and search if a standard C functions exists by checking the list of standard C functions that I will also be defining (#define CFUNCTIONS "printf...")
As you know there are so many standard C functions, so defining all of them will be so annoying.
Any idea on how can I check if a string is a standard C functions?
If you have heard of cscope, try looking into the database it generates. There are instructions available at the cscope front end to list out all the functions that a given function has called.
If you look at the list of the calls from main(), you should be able to narrow down your work considerably.
If you have to parse by hand, I suggest starting with the included standard headers. They should give you a decent idea about which functions could you expect to see in main().
Either way, the work sounds non-trivial and interesting.
Parsing C source code seems simple at first blush, but as others have pointed out, the possibility of a programmer getting far off the leash by using #defines and #includes is rather common. Unless it is known that the specific program to be parsed is mild-mannered with respect to text substitution, the complexity of parsing arbitrary C source code is considerable.
Consider the less used, but far more effective tactic of parsing the object module. Compile the source module, but do not link it. To further simplify, reprocess the file containing main to remove all other functions, but leave declarations in their places.
Depending on the requirements, there are two ways to complete the task:
Write a program which opens the object module and iterates through the external reference symbol table. If the symbol matches one of the interesting function names, list it. Many platforms have library functions for parsing an object module.
Write a command file or script which uses the developer tools to examine object modules. For example, on Linux, the command nm lists external references with a U.
The task may look simple at first but in order to be really 100% sure you would need to parse the C-file. It is not sufficient to just look for the name, you need to know the context as well i.e. when to check the id, first when you have determined that the id is a function you can check if it is a standard c-runtime function.
(plus I guess it makes the task more interesting :-)
I don't think there's any way around having to define a list of standard C functions to accomplish your task. But it's even more annoying than that -- consider macros,
for example:
#define OUTPUT(foo) printf("%s\n",foo)
main()
{
OUTPUT("Ha ha!\n");
}
So you'll probably want to run your code through the preprocessor before checking
which functions are called from main(). Then you might have cases like this:
some_func("This might look like a call to fclose(fp), but surprise!\n");
So you'll probably need a full-blown parser to do this rigorously, since string literals
may span multiple lines.
I won't bring up trigraphs...that would just be pointless sadism. :-) Anyway, good luck, and happy coding!

Reflection support in C

I know it is not supported, but I am wondering if there are any tricks around it. Any tips?
Reflection in general is a means for a program to analyze the structure of some code.
This analysis is used to change the effective behavior of the code.
Reflection as analysis is generally very weak; usually it can only provide access to function and field names. This weakness comes from the language implementers essentially not wanting to make the full source code available at runtime, along with the appropriate analysis routines to extract what one wants from the source code.
Another approach is tackle program analysis head on, by using a strong program analysis tool, e.g., one that can parse the source text exactly the way the compiler does it.
(Often people propose to abuse the compiler itself to do this, but that usually doesn't work; the compiler machinery wants to be a compiler and it is darn hard to bend it to other purposes).
What is needed is a tool that:
Parses language source text
Builds abstract syntax trees representing every detail of the program.
(It is helpful if the ASTs retain comments and other details of the source
code layout such as column numbers, literal radix values, etc.)
Builds symbol tables showing the scope and meaning of every identifier
Can extract control flows from functions
Can extact data flow from the code
Can construct a call graph for the system
Can determine what each pointer points-to
Enables the construction of custom analyzers using the above facts
Can transform the code according to such custom analyses
(usually by revising the ASTs that represent the parsed code)
Can regenerate source text (including layout and comments) from
the revised ASTs.
Using such machinery, one implements analysis at whatever level of detail is needed, and then transforms the code to achieve the effect that runtime reflection would accomplish.
There are several major benefits:
The detail level or amount of analysis is a matter of ambition (e.g., it isn't
limited by what runtime reflection can only do)
There isn't any runtime overhead to achieve the reflected change in behavior
The machinery involved can be general and applied across many languages, rather
than be limited to what a specific language implementation provides.
This is compatible with the C/C++ idea that you don't pay for what you don't use.
If you don't need reflection, you don't need this machinery. And your language
doesn't need to have the intellectual baggage of weak reflection built in.
See our DMS Software Reengineering Toolkit for a system that can do all of the above for C, Java, and COBOL, and most of it for C++.
[EDIT August 2017: Now handles C11 and C++2017]
Tips and tricks always exists. Take a look at Metaresc library https://github.com/alexanderchuranov/Metaresc
It provides interface for types declaration that will also generate meta-data for the type. Based on meta-data you can easily serialize/deserialize objects of any complexity. Out of the box you can serialize/deserialize XML, JSON, XDR, Lisp-like notation, C-init notation.
Here is a simple example:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "metaresc.h"
TYPEDEF_STRUCT (point_t,
double x,
double y
);
int main (int argc, char * argv[])
{
point_t point = {
.x = M_PI,
.y = M_E,
};
char * str = MR_SAVE_XML (point_t, &point);
if (str)
{
printf ("%s\n", str);
free (str);
}
return (EXIT_SUCCESS);
}
This program will output
$ ./point
<?xml version="1.0"?>
<point>
<x>3.1415926535897931</x>
<y>2.7182818284590451</y>
</point>
Library works fine for latest gcc and clang on Linux, MacOs, FreeBSD and Windows.
Custom macro language is one of the options. User could do declaration as usual and generate types descriptors from DWARF debug info. This moves complexity to the build process, but makes adoption much easier.
any tricks around it? Any tips?
The compiler will probably optionally generate 'debug symbol file', which a debugger can use to help debug the code. The linker may also generate a 'map file'.
A trick/tip might be to generate and then read these files.
Based on the responses to How can I add reflection to a C++ application? (Stack Overflow) and the fact that C++ is considered a "superset" of C, I would say you're out of luck.
There's also a nice long answer about why C++ doesn't have reflection (Stack Overflow).
I needed reflection in a bunch of structs in a C++ project.
I created a xml file with the description of all those structs - fortunately the fields types were primitive types.
I used a template (not C++ template) to auto generate a class for each struct along with setter/getter methods.
In each class I used a map to associate string names and class members (pointers to members).
I didn't regret using reflection because it opened new ways to design my core functionality that I couldn't even imagine without reflection.
(BTW, it was an external report generator for a program that uses a raw database)
So, I used code generation, function pointers and maps to simulate reflection.
I know of the following options, but all come at cost and a lot of limitations:
Use libdl (#include <dfcln.h>)
Call a tool like objdump or nm
Parse the object files yourself (using a corresponding library)
Involve a parser and generate the necessary information at compile time.
"Abuse" the linker to generate symbol arrays.
I'll use a bit of unit test frameworks as examples further down, because automatic test discovery for unit test frameworks is a typical example where reflection comes in very handy, and it's something that most unit test frameworks for C fall short of.
Using libdl (#include <dfcln.h>) (POSIX)
If you're on a POSIX environment, a little bit of reflection can be done using libdl. Plugins are developed that way.
Use
#include <dfcln.h>
in your source code and link with -ldl.
Then you have access to functions dlopen(), dlerror(), dlsym() and dlclose() with which you could load and access / run shared objects at runtime. However, it does not give you easy access to the symbol table.
Another disadvantage of this approach is that you basically restrict reflection to objects loaded as dynamic library (shared object loaded at runtime via dlopen()).
Running nm or objdump
You could run nm or objdump to show the symbol table and parse the output.
For me, nm -P --defined-only -g xyz.o gives good results, and parsing the output is trivial.
You'd be interested in the first word of each line only, which is the symbol name, and maybe the second one, which is the section type.
If you do not know the object name in some static way, i.e. the object is actually a shared object, at least on Linux you then might want to skip symbol names starting with '_'.
objdump, nm or similar tools are also often available outside POSIX environments.
Parsing the object files yourself
You could parse the object files yourself. You probably don't want to implement that from scratch but use an existing library for that. This is how nm, objdump and even libdl are implemented. You could peek at the source code of nm, objdump and libdl and the libraries they use in order to find out how they do what they do.
Involving a Parser
You could write a parser and code generator which generates the necessary reflective information at compile time and stores it in the object file. Then you have a lot of freedom and could even implement primitive forms of annotations. That's what some unit test frameworks like AceUnit do.
I found that writing a parser which covers straight-forward C syntax is fairly trivial. Writing a parser which really understands C and could deal with all cases is NOT trivial.
So, this has limitations which depend on how exotic the C syntax is that you want to reflect upon.
"Abusing" the linker to generate symbol arrays
You could put references to symbols which you want to reflect upon in a special section and use a linker configuration to emit the section boundaries so you can access them in C.
I've described here N-Dependency injection in C - better way than linker-defined arrays? how this works.
But beware, this is depending on a lot of things and not very portable. I have only tried this with GCC/ld, and I know it doesn't work with all compilers / linkers. Also, it's almost guaranteed that dead code elimination will not detect how you call this stuff, so if you use dead code elimination, you will have to add all the reflected symbols as entry points.
Pitfalls
For some of the mechanisms, dead code elimination can be a problem, in particular when you "abuse" the linker to generate a symbol arrays. It can be worked around by telling the reflected symbols as entry points to the linker, and depending on the amount of symbols this might be neither nice nor convenient.
Conclusion
Combining nm and libdl can actually give quite good results. The combination can be almost as powerful as the level of Reflection used by JUnit 3.x in Java. The level of reflection given is sufficient to implement a JUnit 3.x-style unit test framework for C, including test-case discovery by naming convention.
Involving a parser is more work and limited to objects that you compile yourself, but gives you most power and freedom. The level of reflection given can be sufficient to implement a JUnit 4.x-style unit test framework for C, including test-case discovery by annotations. AceUnit is a unit test framework for C that does exactly this.
Combining parsing and the linker to generate symbol arrays can give very nice results - if your environment is so much under your control that you can ensure that working with the linker that way works for you.
And of course you can combine all approaches to stitch together the bits and pieces until they fit your needs.
You would need to implement it from yourself from the ground up. In straight C, there is no runtime information whatsoever kept on structure and composite types. Metadata simply does not exist in the standard.
Implementing reflection for C would be much simpler... because C is simple language.
There is some basic options for analazing program, like detect if function exists by calling dlopen/dlsym -- depends on your needs.
There are tools for creating code that can modify/extend itselfusing tcc.
You may use the above tool in order to create your own code analizers.
For similar reasons to the author of the question, I have been working on a C-type-reflection-API along with a C reflection graph database format and a clang plug-in that writes reflection metadata.
The intent is to use the C reflection API for writing serialization and deserialization routines, such as mappers for ASN.1, function argument printers, function proxies, fuzzers, etc. Clang and GCC both have plugin APIs that allow access to the AST but there currently is no standard graph format for C reflection metadata.
The proposed C reflection API is called Crefl:
https://github.com/michaeljclark/crefl
The Crefl API provides runtime access to reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field (member), array, constant, variable.
The Crefl reflection graph database format for portable reflection metadata.
The Crefl clang plug-in outputs C reflection metadata used by the library.
The Crefl API provides task-oriented query access to C reflection metadata
A C reflection API provides access to runtime reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field, array, constant, variable. The Crefl C reflection data model is essentially a transcription of the C data types in ISO/IEC 9899:9999.
C intrinsic data types.
integer types.
floating-point types.
complex number types.
boolean type.
nested struct, union, field, and bitfield
arrays and pointers
typedef type aliases
enum and enum constants
functions and function parameters
const, volatile and restrict qualifiers
GNU-C style attributes using (__attribute__).
The library is still a work in progress. The hope is to find others who are interested in reflection support in C.
Parsers and Debug Symbols are great ideas. However, the gotcha is that C does not really have arrays. Just pointers to stuff.
For example, there is no way by reading the source code to know whether a char * points to a character, a string, or a fixed array of bytes based on some "nearby" length field. This is a problem for human readers let alone any automated tool.
Why not use a modern language, like Java or .Net? Can be faster than C as well.

Resources