Supporting more than one codebase in ANSI-C - c

I am working on a project, with an associated Ansi-C code base. (let me call this the 'main' codebase).
I now am confronted with a typical problem (stated below), which I believe I would be able to solve much easily if I had an object-oriented language at hand.
The problem is this:
I will have to start more than one codebases; i.e. I will have to start supporting a parallel codebase (even maybe more in the future). The initial codebases for all the new (i.e. parallel) codebases will initially be identical as the old (i.e. 'main') codebase.
As we are talking about the 'C' language, I have till now been thinking of adding '#ifdef' statements to code, and writing the branch-spacific code inside those 'ifdef' blocks.
Hoping that I made the problem clear (enough!), I would like to hear thoughts on clever patterns that would help me handle this problem elegantly in Ansi C.
Cheers

What is going to change between the different code bases?
If it is just different platforms, you carefully isolate the platform dependencies, keeping as much of the core code the same across all platforms as possible, and putting platform-specific stuff into separate files.
If you are going to be changing the program functionality radically, you need to work out how to keep a common core of unchanged code while allowing for the differences between the programs.
Notice that in both cases, it is first and foremost a question of understanding what is going to change, and setting up the code so that as little as possible changes. Often, the best way to handle the variations (there are inevitably going to be variations; otherwise, there's no point in having two or more versions) is to put the different variations in separate files, and compile and link the correct files. Sometimes, though, it is deemed better to put the variations (or one part of the variations) in a single file and use #ifdef style conditional compilation.
The other key point is to keep everything under the same version control system - for as long as possible, or a decade or two longer than that.
The biggest disaster I've seen occurred when the different versions stopped using a common code base. Now we need to reintegrate the two code bases, the decade of separate development is a major obstacle. The previous fifteen years of integrated development had its ups and downs - but nothing compared to the problem we now face. Ugh!

You can abstract the differences away (or at least try). Place all your codebase-specific code in files separate from your program-logic. You can then use conditional compilation (ifdef-include) and only ever have to replace the codebase-specific files for each new codebase while your whole application-logic can stay unchanged.

As an alternative, or a complement, to #ifdef-based solutions, you can maintain different branches in a SCM.

Related

Top-down approach in C: Interface implemented via multiple C files

I am responsible for designing the software architecture of an embedded system in C90 (which is dictated by the target hardware compiler). It shall be easily built against a couple of targets (traditional testing, Software-In-The-Loop, final hardware). Therefore I took a top-down approach or, designing for an interface:
Once defined the data flows of the system (inputs, outputs, ...) I have created generical interfaces in the form of .H files that need to be implemented by the targets.
Therefore, and for the sake of the question, let them be two:
imeasures.h --> Measures needed by the algorithm
icomm.h --> Data flow to and from the algorithm to other devices
For the production target, suppose that all the measures but one (e.g. Engine Speed) are taken using ADCmeasures module, and the last mentioned one (Engine Speed) is provided by RS232comm module.
Question 1
Is it OK if imeasures.h is implemented using both ADCmeasures and RS232comm modules in the following form?
imeasures.h <--is implemented BY-- imeasuresImpl.c
imeasuresImpl.c --> calls functions from ADCmeasures.h and RS232comm.h
Therefore, switching targets would imply changing imeasuresImpl and the rest of callees.
Question 2
Due to the overhead the previous method may suppose (which could be mitigated using inline functions, indeed) , I also thought about a ¿less elegant? form:
imeasures.h <-- is partially implemented by ADCmeasures.c
imeasures.h <-- is partially implemented by RS232comm.c
Which pitfalls do you see? I can see that, for example, if imeasures.h consists of a single getter method which returns a struct, I would have to partially fill the struct in both of the partial implementations. Or, in turn, provide different getter methods, and then I would be deciding beforehand a layout of the implementation which would break the top-down principle.
Thank you for your attention.
First, some assumptions on the situation, the requirements
So I assume that through imeasures.h, preferably you would like to get an interface with a single get function which would return you a structure nicely populated with the most fresh measurements. While it is possible, you may accept some other functions like run which would run the processes necessary for the measurements, and an init to initialize the stuff (I mean with "possible" that there are ways I sometimes explored by which you can get around without these two latter functions).
As you tell, I assume you would like to separate an as thin hardware interface as possible, so you could easier apply simulation for testing, or later you would have less to reimplement when porting to different hardware.
As the interface suggest, you would like to hide the split (that one of your measurements come from RS232).
Solving with something like Q1, the architecture
Your take with Q1 seems to be an okay approach for laying down the architecture to meet these requirements. For Q2 I think "forget that", I can't conceive any reasonable solution which would appear like that.
My approach, just like your Q1, would require at least three implementation file.
On the top would be an imeasures.c (I would stick to this name, since that's the usual way of doing these, and there is no very good reason to do anything different here). This file would implement the whole imeasures.h interface, in it containing the logic for assembling the measurements, and dispatching the hardware-specific components. It would not contain anything hardware-specific by itself.
An RS232comm.c (and .h) would realize the RS232 hardware interface. I would do this as generic as reasonable, within the necessities of meeting the requirements (for example if it would only need to receive, I would only implement a receiver adequate for the project here). The goal is to have something which meets the project's requirements, however if needed, may be re-used for other projects on the same (or similar) hardware.
An ADCcomm.c (and .h). Note that I did not name it ADCmeasures.c for a good reason: since I don't want to have anything specific for the actual measurements here. Just like above: something necessary by the requirements, but generic enough so it might be possible to be reused.
Following this, it is likely that you get an imeasures.c which does not need to be altered in any means for the simulation (has no hardware specific code), so can also be tested in that testing environment. You also get useful little hardware specific components which you can reuse for new projects (in my case it happened quite frequently as many times electrical engineers would iterate on the same piece of hardware for later projects).
Usually you shouldn't have to be concerned about overhead. Design first, optimize only where it is actually necessary. If you design well, you may even likely to end up with an end product performing better, just because you don't have to battle with messy performance code (or "I thought it would perform better" code), taking your time from recognizing the real bottlenecks, and time from either discovering better algorithms or optimizing those parts which actually need it.
Well, hope it helps in getting across this!

Manually translating code from one language to another

I often write codes in MATLAB/Python to test whether my algorithm is feasible (& actually works). I then need to convert the entire code into C and sometimes, in FORTRAN90.
What would be a good way to manually convert a medium sized code from one language to another?
I have tried :
Converting the entire code from one into another and then testing it.
(Sometimes, there are errors and bugs which just won't go away and the finding the source of the error becomes a problem)
Go line by line and check for consistency of outputs every few lines.
(Too time consuming)
Use converters like f2c.
(In my experience, they are extremely horrible. I link to a lot of libraries which have different function calls for C and Fortran)
Also,:
I am fairly conversant with the programming languages I deal with so I don't need manuals or reference guides for my work (i.e. I know the syntax).
I am not asking this question specifically about MATLAB and C but rather as a translation paradigm.
Regarding the size, the codes are less than 100 lines long.
I dont want to call the code of one language to another. Please don't suggest that.
Different languages call for different paradigms. You definitely don't write and design code the same way in eg. Matlab, Python, C# or C++. Even object hierarchies will change a lot depending on the language.
That said, if your code consists in a few interconnected procedures, then you may go away with a direct line by line translation (every language allow you to write two or three interconnected functions while remaining idiomatic). But this is the case only for the simplest programs.
Prototyping in a high level language and then implementing the same idea in a robust and clean way in a "production" language is a very good practice, but involves two very different things :
Prototype in whatever language you want. Test, experiment, and convince yourself that the idea works. Pay attention to the big picture, don't focus on performance but on the high level ideas. Pay also attention to difficulties that you encounter when implementing, as you'll face them again in step 2.
Implement from scratch the idea in the production environment in language X. It will be quicker than if you did not do the prototyping stage, since most of the difficulties have been met in stage 1. Use idiomatic X, and focus on correctness. Pay attention to corner cases, general robustness, and once it works correctly, performance. You'll notice that roughly half of your code is made of new things which did not appear in 1. (eg. error checking, corner case handling, input/output, unit testing, etc).
You can see that line by line translation is obviously not a good idea, since you don't translate into the same program.
Also, when not prototyping, I find myself throwing away the first version and making another one that I like better, ie. I find myself prototyping ! Implementing the same thing twice is not a loss of time, it is normal development flow.
You may want to consider using a higher level domain specific language with multiple backends (e.g., Matlab, C, Fortran), producing clean and idiomatic code for each target language, probably with some optimisations. If your problem domain is narrow and every piece of code is more or less typical, it should be fairly trivial to design and implement such a DSL.
Break the source down into psuedo-code with input/process/output and then write your new code base to fit that spec.

Find functional changes between two revisions of a file (compile diff?)

I'm looking for a tool that checks whether two (C) source code files generate the same binary so that I can find actual functional changes between two files and ignore mere coding style changes.
It would be great if this worked even within a file for different changesets, so a file may have changed in coding style on some places, but also had one functional patch added.
It's very very hard to write a program to figure out the "functional" result of another program. Such a program sounds like it would be necessary for this. I would guess that computer programs themselves are right about the most compact and machine-readable way we have to even describe functionality, so it's kind of hard to write a program that analyses a program and generates a "better" description.
Somehow abstracting out and "understanding" that coding style differences don't affect functionality also sounds very, very hard. I find it hard when manually reading other people's code somehow, because the differences in style can be pretty large, even though the end result might be the same in "my style".
I would be surprised if a solution wouldn't also require a solution to the halting problem, which is proven impossible for the general case.
The only way is to compile both with the same compiler options and do a binary diff.
It's not only style changes you'd have to look out for; someone may have extracted code to a function that gets inlined in an optimised build. This may, or may not, depending on compiler options and version, give the same binary.
Mapping binary back to source to "high level functionality" - unlikely.
Comparing two source files with respect to "high level functionality" (ignoring coding style) - possible:
http://cscope.sourceforge.net/
Alternative suggestion:
Write a tool that "normalizes" your source files - by applying the same formatting to both sets of code.
This can easily be automated.
For example:
1) checkout both from version control,
2) apply "standard format",
3) compare
If all you're interested in is whether they both "generate the same binary", then the easiest solution is simply to generate both binaries, and compare.
Note, however, that there are things that would result in binaries that are bitwise different, even though they're functionally identical:
Change in external function names
Optimisations
Reordering non-dependent code snippets
etc.
There is a branch of computer science that deals with concurrency and parallel processes.
One of the applications is deciding whether two systems are behaviorally equivalent (in some bisimulation relation (weak or strong)).
Though it's computationally very difficult to decide whether two large systems are behaviorally equivalent. The usage is mainly for verification of small critical applications where we can't afford failure.

Have you written very long functions? If so, why?

I am writing an academic project about extremely long functions in the Linux kernel.
For that purpose, I am looking for examples for real-life functions that are extremely long (few hundreds of lines of code), that you don't consider bad programming (i.e., they won't benefit from decomposition or usage of a dispatch table).
Have you ever written or seen such a code? Can you post or link to it, and give explanation of why is it so long?
I have been getting amazing help from the community here - any idea that will be taken into the project will be properly credited.
Thanks,
Udi
The longest functions that I have ever written all have one thing in common, a very large switch statement. There are times, when you have to switch on a long list of items and it would only make things harder to understand if you tried to refactor some of the options into a separate function. Having large switch statements makes the Cyclomatic complexity go through the roof, but it is often better than the alternative implementations.
It was the last one before I got fired.
A previous job: An extremely long case statement, IIRC 1000+ lines. This was long before objects. Each option was only a few lines long. Breaking it up would have made it less clear. There were actually a pair of such routines doing different things to the same underlying set of data types.
Sorry, I don't have the code anymore and it isn't mine to post, anyway.
The longest function that I didn't see as being horrible would be the key method of a custom CPU VM. As with #epotter, this involved a big switch statement. In fact I'd say a lot of method that I find resist being cleanly broken down or improved in readability involve switch statements.
Unfortunately, you won't often find this type of subroutine checked in or posted somewhere if it's auto-generated during a build step using some sort of code generator.
So look for projects that have C generated from another language.
Beside the performance, I think the size of the call stack in Kernel space is 8K (please verify the size). Also, as far as I know, code in kernel is fairly specific. If some code is unlikely to be re-used in the future why bother make it a function considering function call overhead.
I could imagine that when speed is important (such as when holding some sort of lock in the kernel) you would not want to break up a function because of the overhead due to making a functional call. When compiled, parameters have to be pushed onto the stack and data has to be popped off before returning. Therefor you may have a large function for efficiency reasons.

What projects cannot be done in C?

I would like to know what projects cannot be done in C.
I know programming can be quicker and more intuitive in
other languages. But I would like to know what features
are missing in C that would prevent a project from being
completed well.
For example, very few web-frameworks exist in C.
C, like many other languages, is Turing Complete.
So simple answer is: none.
However, C++ Template Meta Programming meets the same criterion, so "it is possible" is not a good criterion to choose tools.
The very first C compiler?
A working solution to the halting problem
Alright, here's one: you cannot write an x86 boot sector in C. This is one of those things that has to be written in ASM.
There are none.
Different languages give you different ways to say things. For some classes of problems a given language may be more expressive and/or concise. Are there projects that you should pick something aside from C? Yes, of course. But to say you can't do it well in C is misleading. It would be better to ask which language is the best choice for the problem at hand, and are the gains worth using something unfamiliar?
Anything can be done in virtually any language.
That said there is a level of practicality. As your system's complexity increases, you need better tools to manage it.
The problems are still solvable, but you start to need more people and much more effort in design. I'm not saying other languages don't benefit from design, I'm saying that the same level and attention to detail may not be required.
Since we programmers are Human (I am at least) we have troubles in one area or another. My biggest is memory. If I can visualize my code as objects, manipulating large modules in my head becomes easier, and my brain can handle larger projects.
Of course, it's even possible to write good OO code in C, the patterns were developed in C by manually managing dispatch tables (tables of pointers with some pointers updated to point to different methods), and this is true of all programming constructs from higher languages--they can be done in any language, but...
If you were to implement objects in C, every single class you wrote would have a large amount of boilerplate overhead. If you made some form of exception handling, you would expose more boilerplate.
Higher level languages abstract this boilerplate out of your code and into the system, simplifying what you have to think about and debug (a dispatch table in C could take a lot of debugging, but in C++ it isn't going to fail because the code generated by a working compiler is going to be bug-free and hidden, you never see it).
I guess I'd say that's the biggest (only?) difference between low level and higher level languages, how much boilerplate do you hide. In the latest batch of dynamic languages, they are really into hiding loop constructs within the language, so more things look like:
directory.forEachFile(print file.name); // Not any real language
In C, even if you isolated part of the looping inside a function, setting up the function pointers and stuff would still take lines of un-obvious code that is not solving part of your primary problem.
There is not a single algorithm that cannot be written with C.
Depends on how much you want to invest (time/money/energy) to make it happen. Otherwise, I'd say there aren't any. It is just easier sometimes to use something else.
OS kernel has been written in C and everything runs over it so you can write everything in C.
Boot sector that needs ASM :-) , I don't think you meant that.

Resources