Is it possible to have the preprocessor turn something like ... this is my comment into // this is my comment?
If not, is there it possible to put something in my make file to do this?
No, the preprocessor only recognizes the same symbol set as C, which means macros has to start with either an underscore or a letter, followed by underscores, letters and digits.
With the C preprocessor: no, it recognizes the same kinds of tokens as C does; you can't introduce new ones.
With your own preprocessor: technically yes, but unless your preprocessor can accurately parse C and make sure that it only does the substitution in the exact right contexts, you will very likely run into problems where it accidentally corrupts your source. Besides, you will be creating an extra learning curve for new developers getting into the code. Overall, I wouldn't recommend it.
Related
I know that the _T(x) macro converts a string literal to a unicode/multibyte
string based on a define, however I find it very annoying that I must make a
underscore and the parenthesis, it really confuses me, I'm not quiet fluent with
macros so I don't know, is there a way to detect all string literals and convert
them to a proper unicode/multibyte string?
No, there isn't a way to avoid the macro completely if you want your code to be portable on Windows. You can of course define your own macro like #define t(x) whatever_T_does if you want to save yourself some keystrokes, but this will probably anger future maintainers of your code.
_T() and _TEXT() are C runtime macros, not Win32 macros. TEXT() (no underscore) is the Win32 macro. Even though they essentially do the same thing, you should use C runtime macros only with C functions, and Win32 macros with Win32 functions. Don't mix them.
Do you really need the ability to compile for both multibyte and Unicode? You don't need any macro if you want multibyte. In a Unicode app it is easier to use L"literal string", which does not need the underscore or the parentheses.
Today i was presented with a wiered fact (or not)
it was said:
"At it is disallowed to write long, descriptive identifier names, and forbidden to write Comments for Linux Drivers written in ANSI C."
When i asked "WTF? Why?" i was told it caused performence issues and errors of such...
not many details there.
I am supprised, but have to ask...
Can this be real?
knowing that Comments are stripped by the compilation pre-processor,
and that Identifiers are either way converted to adresses.
so... Can it cause Problems ?
Well, ANSI C is a standard, and a standard is something itself that everyone must follow (I mean compiler designers and programmers, if they decide to support it).
ANSI C standard states that exported identifiers (yeah, exported identifiers are stored as symbols in symbols table as is, not just addresses) must not be longer than 6 characters, and non-exported identifiers are ok to be not longer than 31 character.
On commenting. Except some obvious pitfalls like accidental code swallowing by multi-line commenting, I recommend you to read Coding Style article for Kernel developers which explains what kind of comments are not encouraged.
Absolutely not. Whatever identifier you used in your code, they will be translated to symbols by compiler.
Also, all comments will be ignored by the compilation pre-processor.
The only effect of comments are help you understand code more quickly .
The only performance impact comments can have is during compile time, though I would say it is neglectable, unless you write whole books as comments.
The identifer names are translated to symbols, so there is also, at best, a performance impact at compiletime, which again is neglectable. Identifer names might hit a maximum limit, but to be honest, I never encountered a problem because of to long identifier names.
No, the first step in the compilation is pre-process your source code to remove comments and do other tricks like expanding macros.
Identifiers are often translated into pointers (to symbol table entries).
I am making a parser currently which aims to be able to input data in a program.
The syntax used is greatly inspired from C.
I would enjoy to reproduct a kind of preprocessor inline substitution into it.
for example
#define HELLO ((variable1 + variable2 + variable3))
int variable1 = 37;
int variable2 = 82;
int variable3 = 928;
Thing is... I'm actually using C. I'm also using standard functions from stdio.h to parse through my files.
So... what techniques I could use to make this work correctly and efficiently?
Does the standard compilers substitute the text by re-copying the stream buffer and making the substitution there as the re-copying occurs or what? Is there more efficient techniques?
I guess we say preprocessor because it first substitutes everything until theres no preproc directives (recursive approach maybe?), and then, it starts doing the real compile job?
Excuse my lack of knowledge!
Thanks!
No, modern C compilers don't implement the preprocessor as a text processor, but they have the different compiler phases (preprocessing being one of them) tangled. This is particularly important for the efficiency of the compiler itself and to be able to track errors back into the original source code.
Also implementing a preprocessor by yourself is a tedious task. Think twice before you start such a project.
Yes, you are right about preprocessors. It has the job of bringing together all files which are requires for the execution of the program to 1 file for eg. stdio.h. Then it allows the compiler to compile the program. The file you want to compile is given as argument to the compiler and the techniques used by the compiler may vary according to the os and the compiler itself
The C preprocessor works on tokens not text. In particular, macro expansion cannot contain preprocessor directives. Other preprocessors, such as m4, work differently.
I would like to know if there is any kind of regular expression expansion within the compiler(GCC) pre processor. Basically more flexible code generation macros.
If there is not a way, how do you suggest i accomplish the same result
The C preprocessor can't do that.
You might want to use a template processor (for instance Mustache but there are many others) that generates what you need before passing it to the compiler.
Also, if you are planning a bigger project and you know this feature will be beneficial you might want to write your own preprocessor that you can run automatically from some build system. Good example of such solution would be moc which enhances C++ for the purpose of Qt framework. Purist might of course disagree.
There is this https://github.com/graph/qc qc = Quick C it allows you to do this in your source code files that end with qc.h
$replace asdf_(\d+) => asdf_ :) $1 blabla
// and now in your code anything that matches the above regular expression
asdf_123
// will become asdf_ :) 123 blabla
And it will output a .cpp & a .h thats preprocessed. Its made to avoid the need to maintain header files. And some other things not making it backwards compatible with c++, but it outputs c++ code so you can do all the c++ things you want at the end of the day.
Edit: I made it and have a bias towards qc.
You might want to look at re2c.org. It it a separate C preprocessor to generate
C code to match regular expressions. I found that and your question when looking for
something similar.
I have built a small code for static analysis of C code. The purpose of building it is to warn users about the use of methods such as strcpy() which could essentially cause buffer overflows.
Now, to formalise the same, I need to write a formal Grammar which shows the excluded libraries as NOT a part of the allowed set of accepted library methods used.
For example,
AllowedSentence->ANSI C Permitted Code, NOT UnSafeLibraryMethods
UnSafeLibraryMethods->strcpy|other potentially unsafe methods
Any ideas on how this grammar can be formalised?
I think, this should not be done at the grammar level. It should be a rule that is applied to the parse tree after parsing is done.
You hardly need a parser for the way you have posed the problem. If your only goal is to object to the presence of certain identifiers ("strcpy"), you can simply build a lexer that processes C and picks identifiers. Special lexemes can recognize your list of "you shouldn't use this". This way you use positive recognition instead of negative recognition to pick out the identifiers that you belive to be trouble.
If you want a more sophisticated analaysis tool, you'll likely want to parse C, an name-resolve the identifers to their actual definitisn, then the scan the tree looking for identifiers that are objectionable. This will at least let you decide if the identifier is actually defined by the user, or comes from some known library; surely, if my code defines strcpy, you shouldn't complain unless you know my strcpy is defective somehow.