Extracting #define statements from another .c file to retain reverse-compatibility - c

I find myself with a "face-palm" type issue, and need advice to solve it
I am working on rolling out a new embedded C project that has been built around a logic engine that was originally written in Dynamic C (DC). The logic engine is written entirely in 1 file, there is no header file. The crux of my problem is that the senior developer wants this logic engine to compile In DC AND in the new C project so that only one copy of the logic, defines, etc needs to be maintained. One of the properties of DC is the includes files are very complicated for many reasons that I don't want to get into here. Long story short, extracting the variables, structure definitions, and define statements to a .h file compromises the DC project, it will not compile. Below is a summary of the two files of interest:
Dynamic C (DC) logic engine, 'DClogicEngine.c'
.#include 'newProject.h'
~ 1000 lines of structs and variables needed in new project.
~ 1500 #define statements needed in new C project
~ 20k lines of logic
New Project (C code) 'newProject.c'
~some code.
~dozens of neccesary references to the #define statements in 'DClogicEngine.c'
To re-iterate my problem, I am perfectly able to use #ifdefs and other compiler options to tie into the logic, and because this is a program I have control over, I was able to include a .h file, allowing me to include the necessary logic in the DC file. The problem is accessing the thousands of #define statements in 'DClogicEngine.c' from 'newProject.c'.
The issue I have is the notorious, face-palm, and age-old problem of "I want to include a .c file in another." I've tried countless ways of placing #include xx.c in different places and then getting clever with include guards, but no luck (and no surprises). I found an old thread and thought I might share the answer because it is humorous:
The bottom line is extracting these statements to a .h file will cause a great deal of pain in retaining reverse compatibility with the DC build. Perhaps a burden that will have to be beared?
No doubt I have myself a stupid problem. My question in three parts is:
1.) has anybody encountered a problem like this and how did you solve it?
2.) Is there anything creative I can do to make this work? Does anybody know any compiler wizardry that may help?
2.) general advice, what is the least painful way to solve this?

Long story short, extracting the variables, structure definitions, and define statements to a .h file compromises the DC project, it will not compile.
This cannot be. You are saying that you have a single DC source file. There should be no systemic reason why you wouldn't be able to extract the part of that source file which contains structure definitions and other declarations in a header file which can be used by both the DC implementation and the the C re-implementation. (I don't know DC; it's possible that there are incompatibilities with C, but the refactoring should not affect the DC side — the compiler still sees a single translation unit like before. That it now consists of a source file with an included header is irrelevant to the compiler.)
The problems your were encountering were likely not of a general nature. They were likely specific to the project and circumstances (order of declarations etc.). Try to isolate them and post specific problems here.

Thanks, everyone for your responses,
In the end, the only way I could make this thing budge was by extracting the necessary variables, structs, and define statements from DClogicEngine.c to a new DClogicEngine.h. We were hesitant to do this simple step because we have failed to make this work in the past and wasted a great deal of time and effort in the process.
We started by extracting a few things a mutual 'dummy.h' the file that was shared between the projects. After we saw that we could appease errors on both the original DC compiler and the C compiler we moved everything to mutual .h file. Eventually, we hit a point where it became apparent we needed to include function prototypes in the header as well, just as we had done the first time (we thought this could be avoided but in the end, it had to be done).
What we did differently this time to appease the DC compiler was placing #ifndef DC_BUILD statements around the function prototypes in the .h file. Simple and effective!
If anyone has this problem in the future these are the steps that we used to solve this problem:
1.) The DC compiler openly accepts define statements, variables, and struct definitions if you remember to use the appropriate startHeader and endHeader tags detailed here, In our case, this was completely painless:
https://catherineh.github.io/programming/2016/03/31/libraries-in-dynamic-c
2.) Avoid the addition of any function prototypes in the .h file, the DC compiler absolutely hates them (unless you have a solid understanding of DC)! Make use of #ifdefs to hide ALL function prototypes in the .h file from the DC compiler.

Related

Why use separate source files?

I'm learning C, coming from scripted languages background it is highly intriguing and rather confusing.
A brief story of how I got to this question:
At first I was confused why I can't include a source (.c) file in another source file, then I found out that function declarations repeat. Then I found out about header files (.h) and was confused, why I have to declare a function in one file then define in another, then if something changes I have to go edit 2 files, so I started defining functions in header files. Then I found out that #ifndef doesn't work across separate source files, so here's the question I can't yet find the answer to:
Why do I even have to use separate source files? Why can't I just have 1 source file and put all of my other code/function definitions in header files, this way I'm going to have things defined once and included once in the final build?
Now don't get me wrong, I'm not thinking I'll start a revolution, I'm just looking for answers as to why this is not how it works.
If you think beyond small learning programs, there are several benefits to splitting code into multiple source files.
Code Organization
Large programming projects can have millions of lines of code. You don't want to have files that big! Editors will probably have trouble handling it. Human beings will have trouble understanding it. Multiple developers would have conflicts all touching the same file. If you separate the code by purpose, it will be much easier to handle.
Build Times
Many code changes are small, while compilation time can be expensive. Compilers typically work on a file at a time, not parts of files. So if you make a tiny change and then have to rebuild the entire project, that can be very time consuming. If your code is separated into multiple source files, making a change in one of them means you only have to recompile that file.
Reusability
Frequently, code can be reused for more than one program. If you have all your code in one source file, you'll have to go and copy that code into another file to reuse it. Of course, now if it has a bug you have two places to fix it. Not good.
Let's say, for example, you have code that uses a linked list. If you put the linked list code into its own source file, you can then simply link that into another program. If there's a bug, you can fix it in one place, recompile, and then re-link the programs that use it.
You can use a single source file for some (small) projects.
For many projects though, it makes sense to divide the source in different source files according to their function.
Let's say your making a game.
Have all the user interface code in its source file.
Have all the computer move algorithms in its source file.
...
Have the main() function which ties it all together in its source file.
Then, to compile for PC you do gcc game.c algo.c ui-pc.c, to compile to android you do gcc game.c algo.c ui-android.c ..., to compile a brand new algorithm you though up and don't know if it's good gcc game.c algo-test.c ui-pc.c
Header files help keep everything in sync. And they're a good place for documentation.

Is it right to simply include all header files?

Remembering the names of system header files is a pain...
Is there a way to include all existing header files at once?
Why doesn't anyone do that?
Including unneeded header files is a very bad practice. The issue of slowing down compilation might or might not matter; the bigger issue is that it hides dependencies. The set of header files you include in a source file should is the documentation of what functionality the module depends upon, and unlike external documentation or comments, it is automatically checked for completeness by the compiler (failing to include needed header files will result in an error). Ensuring the absence of unwanted dependencies not only improves portability; it also helps you track down unneeded and potentially dangerous interactions, for instance cases where a module which should be purely computational or purely data structure management is accessing the filesystem.
These principles apply whether the headers are standard system headers or headers for modules within your own program or third-party libraries.
Your source code files are preprocessed before the compiler looks at them, and the #include statement is one of the directives that the preprocessor uses. When being preprocessed, #include statements are replaced with the entire contents of the file being included. The result of including all of the system files would be very large source files that the compiler then needs to work through, which will cost a lot of time during compilation.
No one includes all the header files. There are too many, and a few of them are mutually exclusive with other files (like ncurses.h and curses.h).
It really is not that bad when writing a program even from scratch. A few are quite easy to remember: stdio.h for any FILE stuff; ctype.h for any character classification, alloc.h for any use of malloc(), etc.
If you don't remember one:
leave the #include out
compile
examine first few error messages for indication of a missing header file, such as some type not declared, or calling a function with assumed parameter types
figure out which function call is the cause
look at the man page (or whatever documentation your compiler has) for that function
notice the #include shown by the documentation and add it
repeat until all errors fixed
It is quite a bit easier for adding to an existing code base. You could go hundreds or thousands of working hours and never have to add a #include.
No it is a terrible idea and will massively increase your compile times and possible make your exe a lot larger by including massive amounts of unused code.
I know what you're talking about, but I need to double-check the function prototypes for the functions I'm using (for ones I don't use daily, anyway) -- I'll just copy and paste the #includes straight out of the manpage for the associated functions. I'm already looking at the manpage (it's a simple K in vim(1)), so it doesn't feel like an extra burden.
You can create a "master" header, where you put all your includes into. Then in everything else include it! Beware of conflicting definitions and circular references... So.... Master1.h, master2.h, ...
Not advocating it. Just saying.

Any good reason to #include source (*.c *.cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)
As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)
Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.
If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.
I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).
I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.
I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

C project structure - header-per-module vs. one big header

I've worked with a number of C projects during my programming career and the header file structures usually fall into one of these two patterns:
One header file containing all function prototypes
One .h file for each .c file, containing prototypes for the functions defined in that module only.
The advantages of option 2 are obvious to me - it makes it cheaper to share the module between multiple projects and makes dependencies between modules easier to see.
But what are the advantages of option 1? It must have some advantages otherwise it would not be so popular.
This question would apply to C++ as well as C, but I have never seen #1 in a C++ project.
Placement of #defines, structs etc. also varies but for this question I would like to focus on function prototypes.
I think the prime motivation for #1 is ... laziness. People think it's either too hard to manage the dependencies that splitting things into separate files can make more obvious, and/or think it's somehow "overkill" to have separate files for everything.
It can also, of course, often be a case of "historical reasons", where the program or project grew from something small, and no-one took the time to refactor the header files.
Option 1 allows for having all the definitions in one place so that you have to include/search just one file instead of having to include/search many files. This advantage is more obvious if your system is shipped as a library to a third party - they don't care much about your library structure, they just want to be able to use it.
Another reason for using a different .h for every .c is compile time. If there is just one .h (or if there are more of them but you are including them all in every .c file), every time you make a change in the .h file, you will have to recompile every .c file. This, in a large project, can represent a valuable amount of time being lost, which can also break your workflow.
1 is just unnecessary. I can't see a good reason to do it, and plenty to avoid it.
Three rules for following #2 and have no problems:
start EVERY header file with a
#ifndef _HEADER_Namefile
#define _HEADER_Namefile_
end the file with
#endif
That will allow you to include the same header file multiple times on the same module (innadvertely may happen) without causing any fuss.
you can't have definitions on your header files... and that's something everybody thinks he/she knows, about function prototypes, but almost ever ignores for global variables.
If you want a global variable, which by definition should be visible outside it's defining C module, use the extern keyword:
extern unsigned long G_BEER_COUNTER;
which instructs the compiler that the G_BEER_COUNTER symbol is actually an unsigned long (so, works like a declaration), that on some other module will have it's proper definition/initialization. (This also allows the linker to keep the resolved/unresolved symbol table.) The actual definition (same statement without extern) goes in the module .c file.
only on proven absolute necessity do you include other headers within a header file. include statements should only be visible on .c files (the modules). That allows you to better interpret the dependecies, and find/resolve issues.
I would recommend a hybrid approach: making a separate header for each component of the program which could conceivably be used independently, then making a project header that includes all of them. That way, each source file only needs to include one header (no need to go updating all your source files if you refactor components), but you keep a logical organization to your declarations and make it easy to reuse your code.
There is also I believe a 3rd option: each .c has its own .h, but there is also one .h which includes all other .h files. This brings the best of both worlds at the expense of keeping a .h up to date, though that could done automatically.
With this option, internally you use the individual .h files, but a 3rd party can just include the all-encompassing .h file.
When you have a very large project with hundreds/thousands of small header files, dependency checking and compilation can significantly slow down as lots of small files must be opened and read. This issue can be often solved by using precompiled headers.
In C++ you would definitely want one header file per class and use pre-compiled headers as mentioned above.
One header file for an entire project is unworkable unless the project is extremely small - like a school assignment
That depends on how much functionality is in one header/source file. If you need to include 10 files just to, say, sort something, it's bad.
For example, if I want to use STL vectors I just include and I don't care what internals are necessary for vector to be used. GCC's includes 8 other headers -- allocator, algobase, construct, uninitialized, vector and bvector. It would be painful to include all those 8 just to use vector, would you agree?
BUT library internal headers should be as sparse as possible. Compilers are happier if they don't include unnecessary stuff.

untangling .h dependencies

What do you do when you have a set of .h files that has fallen victim to the classic 'gordian knot' situation, where to #include one .h means you end up including almost the entire lot? Prevention is clearly the best medicine, but what do you do when this has happened before the vendor (!) has shipped the library?
Here's an extension to the question, and this is probably the more pertinent question -- should you even attempt to disentangle the dependencies in the first place?;
I've done this on a C++ code base that was already split into many libraries (which was a good start).
I had to workout (or guess) which library was the most depended upon, which depended upon nothing else in the code base. I then processed each library in turn.
I looked at each module (*.cpp files) in turn and made sure that its own header was #included first and commented out the rest, then I commented out all the #includes in that header file and then re-compiled just that module to let the compiler tell me what was needed. I would un-comment the first header that seemed to be needed, and reviewed that one, recursing as necessary. It was interesting to see how many headers ended up not being needed.
Where only the name is needed (because you have a pointer or reference) use class name; or struct name;, which is called forward declaration and avoid #including the header file.
The compiler is very helpful in telling you what the dependencies are when you comment out #includes (you need to recompile with ALL the compilers you have to maintain portability).
Sometimes I had to move modules between libraries so that no pairs or groups of libraries were mutually dependant.
As you have the opportunity, you should refactor the code to reduce includes that are too large, however that assumes you can achieve some sort of package cohesion. If you disentangle things just to discover that every user of the code has to include all the elements anyway, the end result is the same.
Another option is to use #defines to configure sections on and off. Regardless, for an existing code base the solution is to move toward package cohesion.
Read: http://ivanov.files.wordpress.com/2007/02/sedpackages.pdf and research issues related to package cohesion.
I've untangled that knot a few times, and it generally helps a lot when maintaining a system to reduce the .h dependencies as much as possible. There are decent tools for generating dependency trees ( I was using Klocwork at the time ).
The downside I found was with conditional compilation. Someone might remove a header file because they think we don't need it, but it turns out that we only don't need it because VxWorks has some screwed up headers... on Solaris (or any reasonable Posix system) you do need it.
There is a balance to be struck between an enormous number of finely organized headers and a single header that includes everything. Consider the Standard C library; there are some biggish headers like <stdio.h>, which declares a lot of functions, but they are all related to I/O. There are other headers that are more of a miscellany - notably <stdlib.h>.
The Goddard Space Flight Center guidelines for C are worth hunting down.
The basic rule is that each header should declare the facilities provided by a suitable (usually small) set of source files. The facilities and header should be self-contained. That is, if someone needs the code in header "something.h", then that should be the only header that must be added to the compilation. If there are facilities needed by "something.h" that are not declared in the header, then it must include the relevant headers. That can mean that headers end up including <stddef.h> because one of the functions uses size_t, for example.
As #quamrana points out, you can use forward declarations for structures (not classes, since the question is tagged C and not C++) when appropriate - which primarily means when the interface takes pointers and does not need to know the size of the structures or any of the members.

Resources