I'm working with an old C code that still has a few dusty corners. I'm finding a lot of #ifdef statements around that refer to operating systems, architectures, etc. and alter the code for portability. I don't know how many of these statements are still relevant today.
I know that #ifdef isn't the best idea in certain circumstances, and I'll be sure to fix that, but what I'm interested in here is what's being tested.
I've listed them below. If you could tell me if any of them are definitely useful in this day and age, or if the machines or OSs with which they're associated have long since expired, that would be great. Also, if you know of any central reference for these, I'd love to hear about it.
Thanks in advance,
Ross
BORLANDC
BSD
CGLE
DRYRUN
HUGE
IBMPC
MAIN
M_XENIX
OPTIMIZED
P2C_H_PROTO
sgi
TBFINDADDREXTENDED
UNIX
vms
__GCC__
__GNUC__
__HUGE__
__ID__
__MSDOS__
__TURBOC__
Here you are.
You are coming from the wrong direction. Instead of asking what code can be safely deleted, you should ask - what code have to stay.
Find out what platforms have to be supported and delete everything that is not defined in any of them. You'll get yourself cleanest code possible that is still guaranteed to work.
What context is this code being used?
If it's a library other people outside your organization are using, you shouldn't touch this stuff unless you're releasing a new version and explicitly removing support for some OSs. In the latter case, you should remove all the relevant IFDEF code as part of making a new release, and should be explicit about what you are removing.
If it's a library people inside your organization are using, you should ask those people what you can remove, not us.
If it's code being used very narrowly (i.e. you control its use directly), you can, if you wish, safely remove any sort of compiler portability, since you are only using one compiler.
You're asking the wrong people: It's your users (or potential users) who decide what's still useful, not us. Start by finding out what platforms you need to support, and then you can find out what's not needed.
If, for example, you don't need to support 16-bit systems, you can dispense with __HUGE__, __MSDOS__, and __TURBOC__.
Any #ifdef based on arbitrary preprocessor definitions provided by the implementation is outdated - especially those which are in the namespace reserved for the application, not the implementation, as most of those are! The correct modern way to achieve this kind of portability is to either detect the presence of different interfaces/features/behavior with a configure script and #define HAVE_FOO etc. based on that, directly test standard preprocessor defines (like UINT_MAX to determine integer size), and/or provide prebuilt header files for each platform you want to support with the appropriate HAVE_FOO definitions.
The old-style "portability" #ifdefs closely coupled knowledge of every single platform all over your source, making for a nightmare when platforms changed and adopted new features, behaviors, or defaults. (Just imagine the mess of old code that assumes Windows is 16bit or Linux has SysV-style signal()!) The modern style isolates knowledge of the platform and allows the conditional compilation in your source files to depend only on the presence/absence/behavior of the feature it wants to use.
Code that is annotated like that can in fact be quite difficult to maintain. You could consider to look into something like autotools or alike to configure your sources for a particular architecture.
Related
I am working on an open source C driver for a cheap sensor that is used mostly for Arduino projects. The project is set up in such a way that it is possible to support multiple platforms outside the Arduino ecosystem, like the Raspberry Pi.
The project is set up with a platform.h file, with the intention of having different implementations of this header file. Like the example below:
platform.h
platform_arduino.c
platform_rpi.c
platform_windows.c
There is this (Cross-Platform C++ code and single header - multiple implementations) Stack Overflow post that goes fairly in depth in how to handle this for C++ but I feel like none of those examples really apply to this C implementation.
I have come up with some solutions like just adding the requirements for each platform at the top of the file.
#if SOME_REQUIREMENT
#include "platform.h"
int8_t t_open(void)
{
// Implementation here
}
#endif //SOME_REQUIREMENT
But this seems like a clunky solution.
It impacts readability of the code.1
It will probably make debugging conflicting requirements a nightmare.
1 Many editors (Like VS Code) try to gray out code which does not match requirements. While I want this most of the time, it is really annoying when working on cross-platform drivers. I could just disable it for the entirety of the project, but in other parts of the project it is useful. I understand that it could probably be solved using VS Code thing. However, I am asking for alternative methods of selecting the right file/code for the platform because I am interested in seeing what other strategies there are.
Part of the "problem" is that support for Arduino is the primary focus, which means it can't easily be solved with makefile magic. My question is, what are alternative ways of implementing a solution to this problem, that are still readable?
If it cannot be done without makefile magic, then that is an answer too.
For reference, here is a simplified example of the header file and implementation
platform.h
#ifndef __PLATFORM__
#define __PLATFORM__
int8_t t_open(void);
#endif //__PLATFORM__
platform_arduino.c
#include "platform.h"
int8_t t_open(void)
{
// Implementation here
}
this (Cross-Platform C++ code and single header - multiple implementations) Stack Overflow post that goes fairly in depth in how to handle this for C++ but I feel like none of those examples really apply to this C implementation.
I don't see why you say that. The first suggestions in the two highest-scoring answers are variations on the idea of using conditional macros, which not only is valid in C, but is a traditional approach. You yourself present an alternative along these lines.
Part of the "problem" is that support for Arduino is the primary focus, which means it can't easily be solved with makefile magic.
I take you to mean that the approach to platform adaptation has to be encoded somehow into the C source, as opposed to being handled via the build system. Frankly, this is an unusual constraint, except inasmuch as it can be addressed by use of the various system-identification macros provided by C compilers of interest.
Even if you don't want to rely specifically on makefiles, you should consider attributing some responsibility to the build system, which you can do even without knowing specifically what build system that is. For example, you can designate macro names, such as for_windows, etc that request builds for non-default platforms. You then leave it to the person building an instance of the driver to figure out how to configure their tools to provide the appropriate macro definition for their needs (which generally is not hard), based on your build documentation.
My question is, what are alternative ways of implementing a solution to this problem, that are still readable?
If the solution needs to be embodied entirely in the C source, then you have three main alternatives:
write code that just works correctly on all platforms, or
perform runtime detection and adaptation, or
use conditional compilation based on macros automatically defined by supported compilers.
If you're prepared to rely on macro definitions supplied by the user at build time, then the last becomes simply
use conditional compilation
Do not dismiss the first out of hand, but it can be a difficult path, and it might not be fully possible for your particular problem (and probably isn't if you're writing a driver or other code for a freestanding implementation).
Runtime adaptation could be viewed as a specific case of code that just works, but what I have in mind for this is a higher level of organization that performs runtime analysis of the host environment and chooses function variants and internal parameters suited to that, as opposed to those choices being made at compile time. This is a real thing that is occasionally done, but it may or may not be viable for your particular case.
On the other hand, conditional compilation is the traditional basis for platform adaptation in C, and the general form does not have the caveat of the other two that it might or might not work in your particular situation. The level of readability and maintainability you achieve this way is a function of the details of how you implement it.
I have come up with some solutions like just adding the requirements for each platform at the top of the file. [...] But this seems like a clunky solution.
If you must include a source file in your build but you don't want anything in it to actually contribute to the target then that's exactly what you must do. You complain that "It will probably make debugging conflicting requirements a nightmare", but to the extent that that's a genuine issue, I think it's not so much a question of syntax as of the whole different code for different platforms plan.
You also complain that the conditional compilation option might be a practical difficulty for you with your choice of development tools. It certainly seems to me that there ought to be good workarounds for that available from your tools and development workflow. But if you must have a workaround grounded only in the C language, then there is one (albeit a bad one): introduce a level of preprocessing indirection. That is, put the conditional compilation directives in a different source file, like so:
platform.c
#if defined(for_windows)
#include "platform_windows.c"
#else
#if defined(for_rpi)
#include "platform_rpi.c"
#else
#include "platform_arduino.c"
#endif
#endif
You then designate platform.c as a file to be built, but not (directly) any of the specific-platform files.
This solves your tool-presentation issue because when you are working on one of the platform-specific .c files, the editor is unlikely to be able to tell whether it would actually be included in a build or not.
Do note well that it is widely considered bad practice to #include files containing function implementations, or those not ending with an extension conventionally designating a header. I don't say otherwise about the above, but I would say that if the whole platform.c contains nothing else, then that's about the least bad variation that I can think of within the category.
The C language has a set of outright reserved keywords. However, there is a much larger set of identifiers that are reserved or semi-reserved, whose use is at least strongly not recommended because they are used by the standard library or various system headers, or may be so used in future, etc; there is a comprehensive though not exhaustive list of those here: https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html
The set of such names is much too large to be feasible to enumerate.
Looking at it from the perspective of using C as a compilation target, I'm looking for the reverse: a set of names I can generate, that are guaranteed to be not reserved, to be free for application use.
Clearly this requirement could be effectively met as far as it goes by prepending a UUID to every name, but there is an additional requirement that the generated code be as amenable as possible to eyeball debugging, so the namespace should be as simple as possible, e.g. if all names are to have a common prefix, that prefix should be as short as possible.
What's the simplest way to characterize a set of names that are guaranteed, or failing that highly likely, to be free for application use? For example, would it be safe to use arbitrary names prefixed with x_ or suchlike?
Most C libraries provide feature-selection macros, which allow you to specify which version of the interface you are using. If you set _POSIX_C_SOURCE and _XOPEN_SOURCE before including any system headers on Linux or UNIX, your system libraries will not declare any identifiers that future versions of UNIX might define. (In theory, setting either one by itself should suffice, but it’s good defensive coding to set both, as this will prevent one or the other from being set inconsistently by someone else.) On Windows, you would declare NTDDI_VERSION and _WIN32_WINNT.
The C Standard Library only provides feature-test macros, not macros that let you choose an interface, but compliers support a flag such as -std=c20, and you should set this in your build scripts. This should disable any new keywords or identifiers that get added to the language in the future.
If you depend on a specific version of a library, and are worried that changes to its header files could break your code, you can put a copy of the headers (and to be absolutely certain, the library itself) in your project tree. If the library is open-source, making a note of which version you used should let anyone else download the right version. Otherwise, you’re at the mercy of its maintainers.
Do not declare _BSD_SOURCE or _GNU_SOURCE if this is a concern for you! Linux headers without glibc bindings, such as <linux/module.h>, generally don’t have this kind of versioning.
Some languages have much more robust solutions for this, such as cabal and stack for Haskell or cargo for Rust.
I implemented a SV module which contains soft constraints. However, as far as I know soft constraints are only supported since 1800-2012 standard. Therefore I would like to add an alternative implementation in case a simulator is used that only supports older standard versions.
Is there a way to retrieve this information with a system task or pre-processor directive in such a way:
if($get_version_specifier == "1800-2012")
// do fancy stuff with soft constraints
else
// alternative fancy stuff
I already found an option for a similar problem by using begin_keywords, end_keywords, but I think that would not solve my issue since it only defines the set of keywords for a specific standard. And if the simulator does not support this version I guess only an error would occur.
Thanks in advance!
sebs
The problem you ask about is more complicated than it seems. Different features of SystemVerilog are implemented by different versions of tool; sometimes before the standard is released, sometimes after. I do know that some tools supported soft constraints before the release of the 1800-2012 standard, and no commercial tool that I know of has yet to support operator overloading, which was in the first IEEE 1800-2005 standard.
A better solution would be to define a set of macros like USE_SOFT_CONSTRAINTS for features that do not have universal support. Then you can include a common featureset.svh file that defines the feature set you want to use. Another good practice is to DOCUMENT the reason you added a specific feature macro (i.e the tool version you were using that didn't support the feature and why you decided it was worth the effort to implement both branches of the code).
As far as I know, there isn't any "standard" way of getting the version of the standard you are using. C++ had a similar problem before the 2011 release (see here). One answer there states that different compilers added different proprietary defines (something like the INCA macro set for the Incisive simulator). You'll have to ask your simulator vendor if a define for the version of the SV standard exists (something like SV2012_OR_GREATER).
Cadence, for example, has something like this for Specman, so if they're consistent they might have this for SystemVerilog as well. Assuming such a thing exists, you could have:
`ifdef SV_2012_OR_GREATER
// do fancy stuff with soft constraints
`else
// alternative fancy stuff
`endif
Bonus: Soft constraints are a declarative construct, so I don't see how you could use an if block to decide whether to use them or not (unless maybe if it's an if inside a constraint block). Also, I'm not sure whether you're able to truly emulate soft constraints in any way, however fancy your approach is, so I don't know if it really makes sense to try.
I would like to determine, whether two functions in two executables were compiled from the same (C) source code, and would like to do so even if they were compiled by different compiler versions or with different compilation options. Currently, I'm considering implementing some kind of assembler-level function fingerprinting. The fingerprint of a function should have the properties that:
two functions compiled from the same source under different circumstances are likely to have the same fingerprint (or similar one),
two functions compiled from different C source are likely to have different fingerprints,
(bonus) if the two source functions were similar, the fingerprints are also similar (for some reasonable definition of similar).
What I'm looking for right now is a set of properties of compiled functions that individually satisfy (1.) and taken together hopefully also (2.).
Assumptions
Of course that this is generally impossible, but there might exist something that will work in most of the cases. Here are some assumptions that could make it easier:
linux ELF binaries (without debugging information available, though),
not obfuscated in any way,
compiled by gcc,
on x86 linux (approach that can be implemented on other architectures would be nice).
Ideas
Unfortunately, I have little to no experience with assembly. Here are some ideas for the abovementioned properties:
types of instructions contained in the function (i.e. floating point instructions, memory barriers)
memory accesses from the function (does it read/writes from/to heap? stack?)
library functions called (their names should be available in the ELF; also their order shouldn't usually change)
shape of the control flow graph (I guess this will be highly dependent on the compiler)
Existing work
I was able to find only tangentially related work:
Automated approach which can identify crypto algorithms in compiled code: http://www.emma.rub.de/research/publications/automated-identification-cryptographic-primitives/
Fast Library Identification and Recognition Technology in IDA disassembler; identifies concrete instruction sequences, but still contains some possibly useful ideas: http://www.hex-rays.com/idapro/flirt.htm
Do you have any suggestions regarding the function properties? Or a different idea which also accomplishes my goal? Or was something similar already implemented and I completely missed it?
FLIRT uses byte-level pattern matching, so it breaks down with any changes in the instruction encodings (e.g. different register allocation/reordered instructions).
For graph matching, see BinDiff. While it's closed source, Halvar has described some of the approaches on his blog. They even have open sourced some of the algos they do to generate fingerprints, in the form of BinCrowd plugin.
In my opinion, the easiest way to do something like this would be to decompose the functions assembly back into some higher level form where constructs (like for, while, function calls etc.) exist, then match the structure of these higher level constructs.
This would prevent instruction reordering, loop hoisting, loop unrolling and any other optimizations messing with the comparison, you can even (de)optimize this higher level structures to their maximum on both ends to ensure they are at the same point, so comparisons between unoptimized debug code and -O3 won't fail out due to missing temporaries/lack of register spills etc.
You can use something like boomerang as a basis for the decompilation (except you wouldn't spit out C code).
I suggest you approach this problem from the standpoint of the language the code was written in and what constraints that code puts on compiler optimization.
I'm not real familiar with the C standard, but C++ has the concept of "observable" behavior. The standard carefully defines this, and compilers are given great latitude in optimizing as long as the result gives the same observable behavior. My recommendation for trying to determine if two functions are the same would be to try to determine what their observable behavior is (what I/O they do and how the interact with other areas of memory and in what order).
If the problem set can be reduced to a small set of known C or C++ source code functions being compiled by n different compilers, each with m[n] different sets of compiler options, then a straightforward, if tedious, solution would be to compile the code with every combination of compiler and options and catalog the resulting instruction bytes, or more efficiently, their hash signature in a database.
The set of likely compiler options used is potentially large, but in actual practice, engineers typically use a pretty standard and small set of options, usually just minimally optimized for debugging and fully optimized for release. Researching many project configurations might reveal there are only two or three more in any engineering culture relating to prejudice or superstition of how compilers work—whether accurate or not.
I suspect this approach is closest to what you actually want: a way of investigating suspected misappropriated source code. All the suggested techniques of reconstructing the compiler's parse tree might bear fruit, but have great potential for overlooked symmetric solutions or ambiguous unsolvable cases.
I'm working on a refactoring tool for C with preprocessor support...
I don't know the kind of refactoring involved in large C projects and I would like to know what people actually do when refactoring C code (and preprocessor directives)
I'd like to know also if some features that would be really interesting are not present in any tool and so the refactoring has to be done completely manually... I've seen for instance that Xref could not refactor macros that are used as iterators (don't know exactly what that means though)...
thanks
Anybody interested in this (specific to C), might want to take a look at the coccinelle tool:
Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. Beyond collateral evolutions, Coccinelle is successfully used (by us and others) for finding and fixing bugs in systems code.
Huge topic!
The stuff I need to clean up is contorted nests of #ifdefs. A refactoring tool would understand when conditional stuff appears in argument lists (function declaration or definitions), and improve that.
If it was really good, it would recognize that
#if defined(SysA) || defined(SysB) || ... || defined(SysJ)
was really equivalent to:
#if !defined(SysK) && !defined(SysL)
If you managed that, I'd be amazed.
It would allow me to specify 'this macro is now defined - which code is visible' (meaning, visible to the compiler); it would also allow me to choose to see the code that is invisible.
It would handle a system spread across over 100 top-level directories, with varying levels of sub-directories under those. It would handle tens of thousands of files, with lengths of 20K lines in places.
It would identify where macro definitions come from makefiles instead of header files (aargh!).
Well, since it is part of the preprocessor... #include refactoring is a huge huge topic and I'm not aware of any tools that do it really well.
Trivial problems a tool could tackle:
Enforcing consistent case and backslash usage in #includes
Enforce a consistent header guarding convention, automatically add redundant external guards, etc.
Harder problems a tool could tackle:
Finding and removing spurious includes.
Suggest the use of predeclarations wherever practical.
For macros... perhaps some sort of scoping would be interesting, where if you #define a macro inside a block, the tool would automatically #undef it at the end of a block. Other quick things I can think of:
A quick analysis on macro safety could be helpful as a lot of people still don't know to use do { } while (0) and other techniques.
Alternately, find and flag spots where expressions with side-effects are passed as macro arguments. This could possibly be really helpful for things like... asserts with unintentional side-effects.
Macros can often get quite complex, so I wouldn't try supporting much more than simple renaming.
I will tell you honestly that there are no good tools for refactoring C++ like there are for Java. Most of it will be painful search and replace, but this depends on the actual task. Look at Netbeans and Eclipse C++ plugins.
I've seen for instance that Xref could
not refactor macros that are used as
iterators (don't know exactly what
that means though)
To be honest, you might be in over your head - consider if you are the right person for this task.
If you can handle reliable renaming of various types, variables and macros over a big project with an arbitrarily complex directory hierarchy, I want to use your product.
Just discovered this old question, but I wanted to mention that I've rescued the free version of Xrefactory for C, now named c-xrefactory, which manages to do some refactorings in macros such as rename macro, rename macro parameter. It is an Emacs plugin.