I'm experiencing a problem with documenting the variables containing the double$ ("$$") in the name. In fact these names are not real variables but rather the Keil Linker-generated constants. The $$ appears to have a special (undocumented?) meaning for the Doxygen parser. If I write the following code
extern char SectionA$$Base[]; /** starting address of the section **/
extern char SectionA$$Limit[]; /** end address of the section **/
Doxygen complains about the undocumented variables $$Base and $$Limit, includes it in the documentation with the names where the $$ is prepend by space using weird combinations of bold, link and so on attributes and without any comment. The total quantity of the variables declared this way is 8. Nevertheless the documentation output contains 2 or sometimes 4 of the declarations. So $$ apears to have a big influence I don't understand.
How can I force Doxygen to treat this variables like any other one?
As far as I am aware, Doxygen has no command or configuration option for accepting invalid identifiers. It does, however, support filtering input files prior to processing them. You could perhaps use that to alter the offending declarations, or to remove them from among those passed on to Doxygen. See the INPUT_FILTER, FILTER_PATTERNS, and related configuration options.
Alternatively, you could put the offending declarations in a separate header, and omit that header from Doxygen processing (EXCLUDE and or EXCLUDE_PATTERNS can help).
I don't see a way to use Doxygen to actually generate documentation for these identifiers.
Related
I am trying to understand when a developer needs to define a C variable with preceding '_'. What is the reason for it?
For example:
uint32_t __xyz_ = 0;
Maybe this helps, from C99, 7.1.3 ("Reserved Identifiers"):
All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
Moral: For ordinary user code, it's probably best not to start identifiers with an underscore.
(On a related note, I think you should also stay clear from naming types with a trailing _t, which is reserved for standard types.)
It is a trick used in the header files of C implementations for global symbols, in order to prevent eventual conflicts with other symbols defined by the user.
Since C lacks a namespace feature, this is a rudimentary approach to avoid name collisions with the user.
Declaring such symbols in your own header and source files is not encouraged because it can introduce naming conflicts between your code and the C implementation. Even if that doesn't produce a conflict on your current implementation, you are still prone to strange conflicts across different/future implementations, since they are free to use other symbols prefixed with underscores.
whether its C or not, the leading underscore provides the programmer a status indication so he does not have to go look it up. In PHP, or any object oriented language where we deal with tens of thousands of properties and methods written by 1000's of authors, seeing an underscore prefix removes the need to go dig through the class andlook up whether its declared private, or protected or public. thats an immense time saver. the practice started before C, i am sure...
I am studying "include/asm-x86/types.h" and I'm a little confused about the meaning of \__signed__.
When I Google this keyword, I cannot get any useful information. Why not just use signed instead of \__signed__, does it have a special meaning?
That is used for backwards compatibility, when older compilers didn't recognize signed keyword, such alternatives were used.
The difference between __signed__ and signed is to do with namespaces. The signed names are only available in the __KERNEL__ and not outside.
As stated at the top of the header file you mention:
/*
* __xx is ok: it doesn't pollute the POSIX namespace. Use these in the
* header files exported to user space
*/
For the signed names without underscores it states this:
/*
* These aren't exported outside the kernel to avoid name space clashes
*/
__signed__ is also used for compile with gcc -traditional, where the keyword signedis not recognized.
Don't google using c __signed__, because special characters like __ are skipped in the search, do a literal search using c "__signed__" and you will get useful information.
I'm new to C and looking at Go's source tree I found this:
https://code.google.com/p/go/source/browse/src/pkg/runtime/race.c
void runtime∕race·Read(int32 goid, void *addr, void *pc);
void runtime∕race·Write(int32 goid, void *addr, void *pc);
void
runtime·raceinit(void)
{
// ...
}
What do the slashes and dots (·) mean? Is this valid C?
IMPORTANT UPDATE:
The ultimate answer is certainly the one you got from Russ Cox, one of Go authors, on the golang-nuts mailing list. That said, I'm leaving some of my earlier notes below, they might help to understand some things.
Also, from reading this answer linked above, I believe the ∕ "pseudo-slash" may now be translated to regular / slash too (like the middot is translated to dot) in newer versions of Go C compiler than the one I've tested below - but I don't have time to verify.
The file is compiled by the Go Language Suite's internal C compiler, which originates in the Plan 9 C compiler(1)(2), and has some differences (mostly extensions, AFAIK) to the C standard.
One of the extensions is, that it allows UTF-8 characters in identifiers.
Now, in the Go Language Suite's C compiler, the middot character (·) is treated in a special way, as it is translated to a regular dot (.) in object files, which is interpreted by Go Language Suite's internal linker as namespace separator character.
Example
For the following file example.c (note: it must be saved as UTF-8 without BOM):
void ·Bar1() {}
void foo·bar2() {}
void foo∕baz·bar3() {}
the internal C compiler produces the following symbols:
$ go tool 8c example.c
$ go tool nm example.8
T "".Bar1
T foo.bar2
T foo∕baz.bar3
Now, please note I've given the ·Bar1() a capital B. This is
because that way, I can make it visible to regular Go code - because
it is translated to the exact same symbol as would result from
compiling the following Go code:
package example
func Bar1() {} // nm will show: T "".Bar1
Now, regarding the functions you named in the question, the story goes further down the rabbit hole. I'm a bit less sure if I'm right here, but I'll try to explain based on what I know. Thus, each sentence below this point should be read as if it had "AFAIK" written just at the end.
So, the next missing piece needed to better understand this puzzle, is to know something more about the strange "" namespace, and how the Go suite's linker handles it. The "" namespace is what we might want to call an "empty" (because "" for a programmer means "an empty string") namespace, or maybe better, a "placeholder" namespace. And when the linker sees an import going like this:
import examp "path/to/package/example"
//...
func main() {
examp.Bar1()
}
then it takes the $GOPATH/pkg/.../example.a library file, and during import phase substitutes on the fly each "" with path/to/package/example. So now, in the linked program, we will see a symbol like this:
T path/to/package/example.Bar1
The "·" character is \xB7 according to my Javascript console.
The "∕" character is \x2215.
The dot falls within Annex D of the C99 standard lists which special characters which are valid as identifiers in C source. The slash doesn't seem to, so I suspect it's used as something else (perhaps namespacing) via a #define or preprocessor magic.
That would explain why the dot is present in the actual function definition, but the slash is not.
Edit: Check This Answer for some additional information. It's possible that the unicode slash is just allowed by GCC's implementation.
It appears this is not standard C, nor C99. In particular, it both gcc and clang complain about the dot, even when in C99 mode.
This source code is compiled by the Part 9 compiler suite (in particular, ./pkg/tool/darwin_amd64/6c on OS X), which is bootstrapped by the Go build system. According to this document, bottom of page 8, Plan 9 and its compiler do not use ASCII at all, but use Unicode instead. At bottom of page 9, it it stated that any character with a sufficiently high code point is considered valid for use in an identifier name.
There's no pre-processing magic at all - the definition of functions do not match the declaration of functions simply because those are different functions. For example, void runtime∕race·Initialize(); is an external function whose definition appears in ./src/pkg/runtime/race/race.go; likewise for void runtime∕race·MapShadow(…).
The function which appears later, void runtime·raceinit(void), is a completely different function, which is aparant by the fact it actually calls runtime∕race·Initialize();.
The go compiler/runtime is compiled using the C compilers originally developed for plan9. When you build go from source, it'll first build the plan9 compilers, then use those to build Go.
The plan9 compilers support unicode function names [1], and the Go developers use unicode characters in their function names as pseudo namespaces.
[1] It looks like this might actually be standards compliant: g++ unicode variable name but gcc doesn't support unicode function/variable names.
This has been asked before but I have a specialised case which I should be able to handle with a regular expression.
I'm trying to read the warning log from Doxygen and the source is in C (so far, I dread to think about C++).
I need to match the functions and variable definitions found in that log and pick up the function and variable names.
More specifically the log has lines like
/home/me/blaa.c:10:Warning: Member a_function(int a, int b) (function) of file blaa.c is not documented
and
/home/me/blaa.h:10:Warning: Member a_variable[SOME_CONST(sizeof(SOME_STRUCT), 64)*ANOTHER_CONST] (variable) of file blaa.h is not documented
With all the variations you can have in C...
Can I match those with just one regexp or should I not even bother? The word in after the "parameter" (I use this loosely to also include the variables) list in parentheses is a set of certain words (function, variable, enum, etc) so if nothing else helps, I could match with those but I'd rather not in case there are types that I haven't seen yet in the logs.
My current attempt looks like
'(?P<full_path>.+):\d+:\s+Warning:\s+Member\s+(?P<member_name>.+)([\(\[](\**)\s*\w+([,)])[\)\]))*\s+\((?P<member_type>.+)\) of file\s+(?P<filename>.+)\s+is not documented'
(I use Python's re package.)
But it still fails to catch everything.
EDIT: There's some mistake in there that I have done in the last edit.
You were allowing zero or more matches between <member_name> and <member_type>. Try this instead:
'(?P<full_path>.+):\d+:\s+Warning:\s+Member\s+(?P<member_name>\w+).*\s+\((?P<member_type>\w+)\) of file\s+(?P<filename>.+)\s+is not documented'
I wanted to know at which line number are the declarations in a given C function.
Also which lines have if/while/for loops or which lines span multiple lines (ie they
donot end on same line).
I think we need to know why you want the line number in order to help you.
Variously:
1) You can use __LINE__ in the code to get the current line number.
2) Most editors can show the line numbers next to the code.
If you want to script breakpoints, I'm not sure if that's possible - I'd suggest setting break-points on filename and function, and then splitting up the code till that's sufficient. Alternatively investigate other ways of getting the testing done - e.g. splitting up the code so unit tests can check it.
Maybe I did not understand your question, but you can use ctags (or one of its variants) to get a list of declarations and their line numbers.
For example exuberant ctags is capable of generating tags (line numbers) for all types of C/C++ language tags, including all of the following:
class names
macro definitions
enumeration names
enumerators
function definitions
function prototypes/declarations
class, interface, struct, and union data members
structure names
typedefs
union names
variables (definitions and external declarations)
If you can, use the diff tool. It provides line numbers as part of the output. Your tool could then parse that output, looking for declarations or primary code.