What is an easily hackable C preprocessor? - c

I want to add a small feature to a C preprocessor, but for that, I need one that is easy to understand and can easily be modified. Specifically, I am looking for the following criteria:
small codesize
well-documented
easy to modify
free software (I want to be able to distribute the modified code without paying fees or so.)
I have already found tcc, which includes a preprocessor and is fairly small but, as far as I can see, lacks good documentation on how preprocessing is implemented. Should I just try to understand how tcc does it or does a better fit exist?

What about http://www.blitzbasic.com/Community/posts.php?topic=20396 or http://mcpp.sourceforge.net/
Seems good to me!

Related

Cross-Platform C single header file and multiple implementations

I am working on an open source C driver for a cheap sensor that is used mostly for Arduino projects. The project is set up in such a way that it is possible to support multiple platforms outside the Arduino ecosystem, like the Raspberry Pi.
The project is set up with a platform.h file, with the intention of having different implementations of this header file. Like the example below:
platform.h
platform_arduino.c
platform_rpi.c
platform_windows.c
There is this (Cross-Platform C++ code and single header - multiple implementations) Stack Overflow post that goes fairly in depth in how to handle this for C++ but I feel like none of those examples really apply to this C implementation.
I have come up with some solutions like just adding the requirements for each platform at the top of the file.
#if SOME_REQUIREMENT
#include "platform.h"
int8_t t_open(void)
{
// Implementation here
}
#endif //SOME_REQUIREMENT
But this seems like a clunky solution.
It impacts readability of the code.1
It will probably make debugging conflicting requirements a nightmare.
1 Many editors (Like VS Code) try to gray out code which does not match requirements. While I want this most of the time, it is really annoying when working on cross-platform drivers. I could just disable it for the entirety of the project, but in other parts of the project it is useful. I understand that it could probably be solved using VS Code thing. However, I am asking for alternative methods of selecting the right file/code for the platform because I am interested in seeing what other strategies there are.
Part of the "problem" is that support for Arduino is the primary focus, which means it can't easily be solved with makefile magic. My question is, what are alternative ways of implementing a solution to this problem, that are still readable?
If it cannot be done without makefile magic, then that is an answer too.
For reference, here is a simplified example of the header file and implementation
platform.h
#ifndef __PLATFORM__
#define __PLATFORM__
int8_t t_open(void);
#endif //__PLATFORM__
platform_arduino.c
#include "platform.h"
int8_t t_open(void)
{
// Implementation here
}
this (Cross-Platform C++ code and single header - multiple implementations) Stack Overflow post that goes fairly in depth in how to handle this for C++ but I feel like none of those examples really apply to this C implementation.
I don't see why you say that. The first suggestions in the two highest-scoring answers are variations on the idea of using conditional macros, which not only is valid in C, but is a traditional approach. You yourself present an alternative along these lines.
Part of the "problem" is that support for Arduino is the primary focus, which means it can't easily be solved with makefile magic.
I take you to mean that the approach to platform adaptation has to be encoded somehow into the C source, as opposed to being handled via the build system. Frankly, this is an unusual constraint, except inasmuch as it can be addressed by use of the various system-identification macros provided by C compilers of interest.
Even if you don't want to rely specifically on makefiles, you should consider attributing some responsibility to the build system, which you can do even without knowing specifically what build system that is. For example, you can designate macro names, such as for_windows, etc that request builds for non-default platforms. You then leave it to the person building an instance of the driver to figure out how to configure their tools to provide the appropriate macro definition for their needs (which generally is not hard), based on your build documentation.
My question is, what are alternative ways of implementing a solution to this problem, that are still readable?
If the solution needs to be embodied entirely in the C source, then you have three main alternatives:
write code that just works correctly on all platforms, or
perform runtime detection and adaptation, or
use conditional compilation based on macros automatically defined by supported compilers.
If you're prepared to rely on macro definitions supplied by the user at build time, then the last becomes simply
use conditional compilation
Do not dismiss the first out of hand, but it can be a difficult path, and it might not be fully possible for your particular problem (and probably isn't if you're writing a driver or other code for a freestanding implementation).
Runtime adaptation could be viewed as a specific case of code that just works, but what I have in mind for this is a higher level of organization that performs runtime analysis of the host environment and chooses function variants and internal parameters suited to that, as opposed to those choices being made at compile time. This is a real thing that is occasionally done, but it may or may not be viable for your particular case.
On the other hand, conditional compilation is the traditional basis for platform adaptation in C, and the general form does not have the caveat of the other two that it might or might not work in your particular situation. The level of readability and maintainability you achieve this way is a function of the details of how you implement it.
I have come up with some solutions like just adding the requirements for each platform at the top of the file. [...] But this seems like a clunky solution.
If you must include a source file in your build but you don't want anything in it to actually contribute to the target then that's exactly what you must do. You complain that "It will probably make debugging conflicting requirements a nightmare", but to the extent that that's a genuine issue, I think it's not so much a question of syntax as of the whole different code for different platforms plan.
You also complain that the conditional compilation option might be a practical difficulty for you with your choice of development tools. It certainly seems to me that there ought to be good workarounds for that available from your tools and development workflow. But if you must have a workaround grounded only in the C language, then there is one (albeit a bad one): introduce a level of preprocessing indirection. That is, put the conditional compilation directives in a different source file, like so:
platform.c
#if defined(for_windows)
#include "platform_windows.c"
#else
#if defined(for_rpi)
#include "platform_rpi.c"
#else
#include "platform_arduino.c"
#endif
#endif
You then designate platform.c as a file to be built, but not (directly) any of the specific-platform files.
This solves your tool-presentation issue because when you are working on one of the platform-specific .c files, the editor is unlikely to be able to tell whether it would actually be included in a build or not.
Do note well that it is widely considered bad practice to #include files containing function implementations, or those not ending with an extension conventionally designating a header. I don't say otherwise about the above, but I would say that if the whole platform.c contains nothing else, then that's about the least bad variation that I can think of within the category.

Detecting/Listing variable declarations in C

I would like to list all the variables that have been declared in my C program for analysis. Is there an easy way I can do this? I would think that building a lexer just for this purpose would be cumbersome. Is there another way?
Well, I think I have to be more clear :-). I intend to analyse a lot of C files using a C library that I intend to write, which needs to have this functionality. Hence, it'd be great if I can do this using C (since it can integrate with my library). However I can pre-process in any other language as well. But it'd increase dependencies.
You're probably going to have to write a pretty powerful parser anyway, if you want to handle typedefs and so on. You might want to look at using clang/llvm - you can probably modify it to output the data you want pretty easily.
cscope (http://cscope.sourceforge.net/) can identify and index all symbols in your program and has a command line mode to query the symbol database from command line or GUI tools.
Doing the job properly requires a significant chunk of the C preprocessor and a lexical analyzer, which is quite a lot of a C compiler.
Doing the job ad hoc is easier - but you get to choose how ad hoc you're going to be.

Are these C #ifdefs for portability outdated?

I'm working with an old C code that still has a few dusty corners. I'm finding a lot of #ifdef statements around that refer to operating systems, architectures, etc. and alter the code for portability. I don't know how many of these statements are still relevant today.
I know that #ifdef isn't the best idea in certain circumstances, and I'll be sure to fix that, but what I'm interested in here is what's being tested.
I've listed them below. If you could tell me if any of them are definitely useful in this day and age, or if the machines or OSs with which they're associated have long since expired, that would be great. Also, if you know of any central reference for these, I'd love to hear about it.
Thanks in advance,
Ross
BORLANDC
BSD
CGLE
DRYRUN
HUGE
IBMPC
MAIN
M_XENIX
OPTIMIZED
P2C_H_PROTO
sgi
TBFINDADDREXTENDED
UNIX
vms
__GCC__
__GNUC__
__HUGE__
__ID__
__MSDOS__
__TURBOC__
Here you are.
You are coming from the wrong direction. Instead of asking what code can be safely deleted, you should ask - what code have to stay.
Find out what platforms have to be supported and delete everything that is not defined in any of them. You'll get yourself cleanest code possible that is still guaranteed to work.
What context is this code being used?
If it's a library other people outside your organization are using, you shouldn't touch this stuff unless you're releasing a new version and explicitly removing support for some OSs. In the latter case, you should remove all the relevant IFDEF code as part of making a new release, and should be explicit about what you are removing.
If it's a library people inside your organization are using, you should ask those people what you can remove, not us.
If it's code being used very narrowly (i.e. you control its use directly), you can, if you wish, safely remove any sort of compiler portability, since you are only using one compiler.
You're asking the wrong people: It's your users (or potential users) who decide what's still useful, not us. Start by finding out what platforms you need to support, and then you can find out what's not needed.
If, for example, you don't need to support 16-bit systems, you can dispense with __HUGE__, __MSDOS__, and __TURBOC__.
Any #ifdef based on arbitrary preprocessor definitions provided by the implementation is outdated - especially those which are in the namespace reserved for the application, not the implementation, as most of those are! The correct modern way to achieve this kind of portability is to either detect the presence of different interfaces/features/behavior with a configure script and #define HAVE_FOO etc. based on that, directly test standard preprocessor defines (like UINT_MAX to determine integer size), and/or provide prebuilt header files for each platform you want to support with the appropriate HAVE_FOO definitions.
The old-style "portability" #ifdefs closely coupled knowledge of every single platform all over your source, making for a nightmare when platforms changed and adopted new features, behaviors, or defaults. (Just imagine the mess of old code that assumes Windows is 16bit or Linux has SysV-style signal()!) The modern style isolates knowledge of the platform and allows the conditional compilation in your source files to depend only on the presence/absence/behavior of the feature it wants to use.
Code that is annotated like that can in fact be quite difficult to maintain. You could consider to look into something like autotools or alike to configure your sources for a particular architecture.

Typical C with C Preprocessor refactoring

I'm working on a refactoring tool for C with preprocessor support...
I don't know the kind of refactoring involved in large C projects and I would like to know what people actually do when refactoring C code (and preprocessor directives)
I'd like to know also if some features that would be really interesting are not present in any tool and so the refactoring has to be done completely manually... I've seen for instance that Xref could not refactor macros that are used as iterators (don't know exactly what that means though)...
thanks
Anybody interested in this (specific to C), might want to take a look at the coccinelle tool:
Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. Beyond collateral evolutions, Coccinelle is successfully used (by us and others) for finding and fixing bugs in systems code.
Huge topic!
The stuff I need to clean up is contorted nests of #ifdefs. A refactoring tool would understand when conditional stuff appears in argument lists (function declaration or definitions), and improve that.
If it was really good, it would recognize that
#if defined(SysA) || defined(SysB) || ... || defined(SysJ)
was really equivalent to:
#if !defined(SysK) && !defined(SysL)
If you managed that, I'd be amazed.
It would allow me to specify 'this macro is now defined - which code is visible' (meaning, visible to the compiler); it would also allow me to choose to see the code that is invisible.
It would handle a system spread across over 100 top-level directories, with varying levels of sub-directories under those. It would handle tens of thousands of files, with lengths of 20K lines in places.
It would identify where macro definitions come from makefiles instead of header files (aargh!).
Well, since it is part of the preprocessor... #include refactoring is a huge huge topic and I'm not aware of any tools that do it really well.
Trivial problems a tool could tackle:
Enforcing consistent case and backslash usage in #includes
Enforce a consistent header guarding convention, automatically add redundant external guards, etc.
Harder problems a tool could tackle:
Finding and removing spurious includes.
Suggest the use of predeclarations wherever practical.
For macros... perhaps some sort of scoping would be interesting, where if you #define a macro inside a block, the tool would automatically #undef it at the end of a block. Other quick things I can think of:
A quick analysis on macro safety could be helpful as a lot of people still don't know to use do { } while (0) and other techniques.
Alternately, find and flag spots where expressions with side-effects are passed as macro arguments. This could possibly be really helpful for things like... asserts with unintentional side-effects.
Macros can often get quite complex, so I wouldn't try supporting much more than simple renaming.
I will tell you honestly that there are no good tools for refactoring C++ like there are for Java. Most of it will be painful search and replace, but this depends on the actual task. Look at Netbeans and Eclipse C++ plugins.
I've seen for instance that Xref could
not refactor macros that are used as
iterators (don't know exactly what
that means though)
To be honest, you might be in over your head - consider if you are the right person for this task.
If you can handle reliable renaming of various types, variables and macros over a big project with an arbitrarily complex directory hierarchy, I want to use your product.
Just discovered this old question, but I wanted to mention that I've rescued the free version of Xrefactory for C, now named c-xrefactory, which manages to do some refactorings in macros such as rename macro, rename macro parameter. It is an Emacs plugin.

Code Obfuscation?

So, I have a penchant for Easter Eggs... this dates back to me being part of the found community of the Easter Egg Archive.
However, I also do a lot of open source programming.
What I want to know is, what do you think is the best way to SYSTEMATICALLY and METHODICALLY obfuscate code.
Examples in PHP/Python/C/C++ preferred, but in other languages is fine, if the methodology is explained properly.
Compile the code with full optimization. Completely strip the binary.
Use a decompiler on the code.
I can guarantee the result will be so utterly unreadable that you won't even be able to read it ;)
In that case, you should use/write an "obfuscator". A program that does the job for you.
The Salamander Obfuscator can be used to obfuscate .Net programs, but it is more to prevent decompilation, thus not exactly what you need.
A good place to learn about obfuscation in C is International Obfuscated C Code Contest
In the spirit of renaming symbols: overuse scope and visibility rules by naming different variables with the same name.
The question is how to create seemingly non-obfuscated code in plain sight (open source) without it appearing to perform another function.
Some obvious methods:
remove comments and as much whitespace as you can without breaking things
join lines
rename variables and functions to be meaningless (preferably 1 character)
For systematic and methodical obfuscation of code, you cannot beat Perl. If you want something that compiles to a binary, there is always APL.
If you are targeting the .NET framework, put your easter egg source code in a resource file as a binhex string. Then you can have one of your initialisaing routines fetch it, decode it and compile it into memory. You can invoke it using reflection.
If you need help with the technical aspects of compiling into memory and calling into the resultant assembly I can give you I library I wrote and a sample program that uses it.
You can use this technology to load plug-ins, which is a legit thing to do and reasonable in an initialiser.

Resources