How to exchange a string with an index during C preprocess - c

I have in several C-sources trace statements, like
TRACE(23, "abc");
TRACE(24, "def");
The numbers 23 and 24 are identifiers counted out of an automatically generated list containing in each line one string
...
"abc"
"def"
...
"abc" is in line 23 and so I write 23 in the appropriate trace statement.
The preprocessor generates me this wanted output
trace(23);
trace(24);
I think it should be possible to automate it in that way that I only write
TRACE("abc");
TRACE("def");
During C preprocessing i want to exchange the strings with the appropriate line number of my generated file automatically, so that I get in the preprocessor output
trace(23);
trace(24);
I can write a function which returns me the line number 23 for the string "abc" but I need to activate it during the preprocessing process. Are there any preprocessor hooks or other ideas?

The preprocessor supplies the automagic macros __FILE__ and __LINE__ (and a few others) which you can use:
#include <stdio.h>
#define TRACE(m) fprintf(stderr, "%s,%d: %s\n", __FILE__ , __LINE__, m)
int main(void)
{
int a;
if(a) TRACE("a");
else TRACE("no");
TRACE("returning");
return 0;
}

I have this idea now: Let the preprocessor generate this output:
trace("abc");
trace("def");
Than write a tool (i.e. a bash script with awk) exchanging the strings with their line number from the generated list file:
...
"abc"
"def"
...
And finally let the compiler do his work. I am not really happy with it, because it needs to be adapted for each compiler. Any better idea?

Related

Expand pragma to a comment (for doxygen)

Comments are usually converted to a single white-space before the preprocesor is run. However, there is a compelling use case.
#pragma once
#ifdef DOXYGEN
#define DALT(t,f) t
#else
#define DALT(t,f) f
#endif
#define MAP(n,a,d) \
DALT ( COMMENT(| n | a | d |) \
, void* mm_##n = a \
)
/// Memory map table
/// | name | address | description |
/// |------|---------|-------------|
MAP (reg0 , 0 , foo )
MAP (reg1 , 8 , bar )
In this example, when the DOXYGEN flag is set, I want to generate doxygen markup from the macro. When it isn't, I want to generate the variables. In this instance, the desired behaviour is to generate comments in the macros. Any thoughts about how?
I've tried /##/ and another example with more indirection
#define COMMENT SLASH(/)
#define SLASH(s) /##s
neither work.
In doxygen it is possible to run commands on the sources before they are fed into the doxygen kernel. In the Doxyfile there are some FILTER possibilities. In this case: INPUT_FILTER the line should read:
INPUT_FILTER = "sed -e 's%^ *MAP *(\([^,]*\),\([^,]*\),\([^)]*\))%/// | \1 | \2 | \3 |%'"
Furthermore the entire #if construct can disappear and one, probably, just needs:
#define MAP(n,a,d) void* mm_##n = a
The ISO C standard describes the output of the preprocessor as a stream of preprocessing tokens, not text. Comments are not preprocessing tokens; they are stripped from the input before tokenization happens. Therefore, within the standard facilities of the language, it is fundamentally impossible for preprocessing output to contain comments or anything that resembles them.
In particular, consider
#define EMPTY
#define NOT_A_COMMENT_1(text) /EMPTY/EMPTY/ text
#define NOT_A_COMMENT_2(text) / / / text
NOT_A_COMMENT_1(word word word)
NOT_A_COMMENT_2(word word word)
After translation phase 4, both the fourth and fifth lines of the above will both become the six-token sequence
[/][/][/][word][word][word]
where square brackets indicate token boundaries. There isn't any such thing as a // token, and therefore there is nothing you can do to make the preprocessor produce one.
Now, the ISO C standard doesn't specify the behavior of doxygen. However, if doxygen is reusing a preprocessor that came with someone's C compiler, the people who wrote that preprocessor probably thought textual preprocessor output should be, above all, an accurate reflection of the token sequence that the "compiler proper" would receive. That means it will forcibly insert spaces where necessary to make separate tokens remain separate. For instance, with test.c the above example,
$ gcc -E test.c
...
/ / / word word word
/ / / word word word
(I have elided some irrelevant chatter above the output we're interested in.)
If there is a way around this, you are most likely to find it in the doxygen manual. There might, for instance, be configuration options that teach it that certain macros should be understood to define symbols, and what symbols those are, and what documentation they should have.

Will the compiler allocate any memory for code disabled by macro in C language?

For example:
int main()
{
fun();//calling a fun
}
void fun(void)
{
#if 0
int a = 4;
int b = 5;
#endif
}
What is the size of the fun() function? And what is the total memory will be created for main() function?
Compilation of a C source file is done in multiple phases. The phase where the preprocessor runs is done before the phase where the code is compiled.
The "compiler" will not even see code that the preprocessor has removed; from its point of view, the function is simply
void fun(void)
{
}
Now if the function will "create memory" depends on the compiler and its optimization. For a debug build the function will probably still exist and be called. For an optimized release build the compiler might not call or even keep (generate boilerplate code for) the function.
Compilation is split into 4 stages.
Preprocessing.
Compilation.
Assembler.
Linker
Compiler performs preprocessor directives before starting the actual compilation, and in this stage conditional inclusions are performed along with others.
The #if is a conditional inclusion directive.
From C11 draft 6.10.1-3:
Preprocessing directives of the forms
#if constant-expression new-line groupopt
#elif constant-expression new-line groupopt
check whether the controlling constant expression evaluates to nonzero.
As in your code #if 0 tries to evaluate to nonzero but remains false, thereby the code within the conditional block is excluded.
The preprocessing stage can be output to stdout with -E option:
gcc -E filename.c
from the command above the output will give,
# 943 "/usr/include/stdio.h" 3 4
# 2 "filename.c" 2
void fun(void)
{
}
int main()
{
fun();
return 0;
}
As we can see the statements with the #if condition are removed during the preprocessing stage.
This directive can be used to avoid compilation of certain code block.
Now to see if there is any memory allocated by the compiler for an empty function,
filename.c:
void fun(void)
{
}
int main()
{
fun();
return 0;
}
The size command gives,
$ size a.out
text data bss dec hex filename
1171 552 8 1731 6c3 a.out
and for the code,
filename.c:
void fun(void)
{
#if 0
int a = 4;
int b = 5;
#endif
}
int main()
{
fun();
return 0;
}
The output of size command for the above code is,
$ size a.out
text data bss dec hex filename
1171 552 8 1731 6c3 a.out
As seen in both cases memory allocated is same by which can conclude that the compiler does not allocate memory for the block of code disabled by macro.
According to Gcc reference:
The simplest sort of conditional is
#ifdef MACRO
controlled text
#endif /* MACRO */
This block is called a conditional group. controlled text will be
included in the output of the preprocessor if and only if MACRO is
defined. We say that the conditional succeeds if MACRO is defined,
fails if it is not.
The controlled text inside of a conditional can include preprocessing
directives. They are executed only if the conditional succeeds. You
can nest conditional groups inside other conditional groups, but they
must be completely nested. In other words, ‘#endif’ always matches the
nearest ‘#ifdef’ (or ‘#ifndef’, or ‘#if’). Also, you cannot start a
conditional group in one file and end it in another.
Even if a conditional fails, the controlled text inside it is still
run through initial transformations and tokenization. Therefore, it
must all be lexically valid C. Normally the only way this matters is
that all comments and string literals inside a failing conditional
group must still be properly ended.
The comment following the ‘#endif’ is not required, but it is a good
practice if there is a lot of controlled text, because it helps people
match the ‘#endif’ to the corresponding ‘#ifdef’. Older programs
sometimes put MACRO directly after the ‘#endif’ without enclosing it
in a comment. This is invalid code according to the C standard. CPP
accepts it with a warning. It never affects which ‘#ifndef’ the
‘#endif’ matches.
Sometimes you wish to use some code if a macro is not defined. You can
do this by writing ‘#ifndef’ instead of ‘#ifdef’. One common use of
‘#ifndef’ is to include code only the first time a header file is
included.

simple script or commands to *substitute* stray "\\n" with "\n"

alright, i understand that the title of this topic sounds a bit gibberish... so i'll try to explain it as clearly as i can...
this is related to this previous post (an approach that's been verified to work):
multipass a source code to cpp
-- which basically asks the cpp to preprocess the code once before starting the gcc compile build process
take the previous post's sample code:
#include <stdio.h>
#define DEF_X #define X 22
int main(void)
{
DEF_X
printf("%u", X);
return 1;
}
now, to be able to freely insert the DEF_X anywhere, we need to add a newline
this doesn't work:
#define DEF_X \
#define X 22
this still doesn't work, but is more likely to:
#define DEF_X \n \
#define X 22
if we get the latter above to work, thanks to C's free form syntax and constant string multiline concatenation, it works anywhere as far as C/C++ is concerned:
"literal_str0" DEF_X "literal_str1"
now when cpp preprocesses this:
# 1 "d:/Projects/Research/tests/test.c"
# 1 "<command-line>"
# 1 "d:/Projects/Research/test/test.c"
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 1 3
# 19 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 3
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 1 3
# 32 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3=
# 33 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3
# 20 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 2 3
ETC_ETC_ETC_IGNORED_FOR_BREVITY_BUT_LOTS_OF_DECLARATIONS
int main(void)
{
\n #define X 22
printf("%u", X);
return 1;
}
we have a stray \n in our preprocessed file. so now the problem is to get rid of it....
now, the unix system commands aren't really my strongest suit. i've compiled dozens of packages in linux and written simple bash scripts that simply enter multiline commands (so i don't have to type them every time or keep pressing the up arrow and choose the correct command successions). so i don`t know the finer points of stream piping and their arguments.
having said that, i tried these commands:
cpp $MY_DIR/test.c | perl -p -e 's/\\n/\n/g' > $MY_DIR/test0.c
gcc $MY_DIR/test0.c -o test.exe
it works, it removes that stray \n.
ohh, as to using perl rather than sed, i'm just more familiar with perl's variant to regex... it's more consistent in my eyes.
anyways, this has the nasty side effect of eating up any \n in the file (even in string literals)... so i need a script or a series of commands to:
remove a \n if:
if it is not inside a quote -- so this won't be modified: "hell0_there\n"
not passed to a function call (inside the argument list)
this is safe as one can never pass a single \n, which is neither a keyword nor an identifier.
if i need to "stringify" an expression with \n, i can simply call a function macro QUOTE_VAR(token). so that encapsulates all instances that \n would have to be treated as a string.
this should cover all cases that \n should be substituted... at least for my own coding conventions.
really, i would do this if i could manage it on my own... but my skills in regex is extremely lacking, only using it in for simple substitutions.
The better way is to replace \n if it occurs in the beginning of line.
The following command should do the work:
sed -e 's/\s*\\n/\n/g'
or occurs before #
sed -e 's/\\n\s*#/\n#/g'
or you can reverse the order of preprocessing and substitute DEF_X with your own tool before C preprocessor.

Automatically inserting filename & line number in logging statements of a C program

I am writing a program for an embedded ARM processor in C. I would like to see the source filename and line number in the logging statements.
As the compiled code has no knowledge of line numbers and source files, I am looking for ways to have this inserted automatically before / during the compile process.
Are there any standard tools or compiler features that I can use for this?
I am using GCC.
For example:
This is what I would write in the source file:
log("<#filename#> <#linenumber#> : Hello World");
This is what would actually get compiled:
log("Foobar.c 225 : Hello World");
Typically you'd do something like this:
// logging function
void log(const char * file, const int line, const char *msg)
{
fprintf(stderr, "%s:%d: %s\n", file, line, msg);
}
// logging macro - passes __FILE__ and __LINE__ to logging function
#define LOG(msg) do { log(__FILE__, __LINE__, msg) } while (0)
Then when you want to log something:
LOG("We made it to this point!");
which will then generate a log message such as:
foo.c:42: We made it to this point!
There is a standard set of predefined macros as part of the preprocessor: https://gcc.gnu.org/onlinedocs/gcc-4.9.0/cpp/Standard-Predefined-Macros.html
The macros you want to use are __FILE__ and __LINE__ which are the file name and line numbers.

Counting the number of # includes and # define

I'd like to use C program to find the total number of directives like #include, #define, #ifdef, #typedef, etc. Could you suggest any logic for that? I'm not interested in using any scripting or tools. I want it to be done purely using C program.
Store all the directives in an array of pointers (or arrays).
Read the C file line by line and check if the first word starts with any of the directives in the list excluding any whitespaces at the beginning.
char *directives[]={"#assert", "#define#, ......};
int count[NUM_DIRS]= { 0 };
Everytime you find a match increment the correspondin index of the count array. You can also maintain another counter for total to avoid adding values in count array.
Assuming you don't want to parse them, or any other kind of syntactic/semantic analysis, you can simply count the number of lines which start with 0 or more whitespace characters and then a # character (losely tested, should work fine):
#include <stdio.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
FILE *f = fopen(argv[1], "r");
char line[1024];
unsigned ncppdirs = 0;
while (feof(f) == 0) {
fgets(line, sizeof(line), f);
char *p = line;
while (isspace(*p))
p++;
if (*p == '#') ncppdirs++;
}
printf("%u preprocessor directives found\n", ncppdirs);
return 0;
}
You might take advantage that gcc -H is showing you every included file, then you might popen that command, and (simply) parse its output.
You could also parse the preprocessed output, given by gcc -C -E ; it contains line information -as lines starting with #
Counting just lexically the occurrences of #include is not enough, because it does happen (quite often, actually, see what does <features.h>) that some included files do tricks like
#if SOME_SYMBOL > 2
#include "some-internal-header.h"
#define SOME_OTHER_SYMBOL (SOME_SYMBOL+1)
#endif
and some later include would have #if SOME_OTHER_SYMBOL > 4
And the compilation command might BTW define SOME_SYMBOL with e.g. gcc -DSOME_SYMBOL=3 (and such tricks happen a lot, often in Makefile-s, and just optimizing with -O2 makes __OPTIMIZE__ a preprocessor defined symbol).
If you want some more deep information about source programs, consider making GCC plugins or extensions, e.g. with MELT (a domain specific language to extend GCC). For instance, counting Gimple instructions in the intermediate representation is more sensible than counting lines of code.
Also, some macros might do some typedef; some programs may have
#define MYSTRUCTYPE(Name) typedef struct Name##_st Name##_t;
and later use e.g. MYSTRUCTYPE(point); what does that mean about counting typedef-s?

Resources