Reorder function in c file based on c header file - c

Is there any tool to automatically reorder the .c file based on .h?
For example, foo.h
void function1();
void function2();
void function3();
And foo.c
void function2(){}
void function1(){}
void function3(){}
Can I reorder it as
void function1(){}
void function2(){}
void function3(){}
By the way, I'm using Vim in Ubuntu.

I don't believe there is such a tool. In C, order of declaration and definition generally does not matter. There is of course the exception of a dependency loop (a() calls b() which calls a()), but when functions are declared in a header file, even this is not a problem, since all the declarations are effectively the "forward" declarations needed to handle dependency loops.
Thus the order of definitions in a translation unit is essentially a matter of taste and style. As such, it is not a feature that editor designers are apt to address, since the effort to create features general and powerful enough to be worthwhile to a preponderance of users may be deemed to be prohibitively large. Think how long it took for (non-programmable) editors to commonly have flexible and powerful automatic indentation and reformatting features.
There is also risk of breaking the logic, grammar, or readability of code when it is reorganized automatically. For example, if comments lie between function definitions, how will the tool know if a comment goes with a particular function, or with a group of functions, or with the function above? And as #Yuri mentions above, what about functions inside #if/#else/#endif blocks? And what about subtle cases like macros that expand to functions or #include directives sandwiched between function definitions? I suppose a reordering feature could restrict its domain to a simple case, but if the case gets too simple, its appeal is correspondingly limited, and it doesn't get released publicly or widely.
All that being said, I think with a rich toolset at your disposal, such a feature would not be too hard to implement, though if I were doing it, I expect that at the first unanticipated difficulty, I would find myself wondering if it wouldn't be easier to just edit the source files by hand. Multiline cut and paste is pretty easy, after all.

Related

Benefits and drawbacks of making all functions in main.c static?

I have heard that, when you have just 1 (main.c) file (or use a "unity build"), there are benefits to be had if you make all your functions static.
I am kind of confused why this (allegedly) isn't optimized by default, since it's not probable that you will include main.c into another file where you will use one of its functions.
I would like to know the benefits and dangers of doing this before implementing it.
Example:
main.c
static int my_func(void){ /*stuff*/ }
int main(void) {
my_func();
return 0;
}
You have various chunks of wisdom in the comments, assembled here into a Community Wiki answer.
Jonathan Leffler noted:
The primary benefit of static functions is that the compiler can (and will) aggressively inline them when it knows there is no other code that can call the function. I've had error messages from four levels of inlined function calls (three qualifying “inlined from” lines) on occasion. It's staggering what a compiler will do!
and:
FWIW: my rule of thumb is that every function should be static until it is known that it will be called from code in another file. When it is known that it will be used elsewhere, it should be declared in a header file that is included both where the function is defined and where it is used. (Similar rules apply to file scope variables — aka 'global variables'; they should be static until there's a proven need for them elsewhere, and then they should be declared in a header too.)
The main() function is always called from the startup code, so it is never static. Any function defined in the same file as an unconditionally compiled main() function cannot be reused by other programs. (Library code might contain a conditionally compiled test program for the library function(s) defined in the source file — most of my library code has #ifdef TEST / …test program… / #endif at the end.)
Eirc Postpischil generalized on that:
General rule: Anytime you can write code that says the use of something is limited, do it. Value will not be modified? Make it const. Name only needs to be used in a certain section? Declare it in the innermost enclosing scope. Name does not need to be linked externally? Make it static. Every limitation both shrinks the window for a bug to be created and may remove complications that interfere with optimization.

How can I maintain correlation between structure definitions and their construction / destruction code?

When developing and maintaining code, I add a new member to a structure and sometimes forget to add the code to initialize or free it which may later result in a memory leak, an ineffective assertion, or run-time memory corruption.
I try to maintain symmetry in the code where things of the same type are structured and named in a similar manner, which works for matching Construct() and Deconstruct() code but because structures are defined in separate files I can't seem to align their definitions with the functions.
Question: is there a way through coding to make myself more aware that I (or someone else) has changed a structure and functions need updating?
Efforts:
The simple:
-Have improved code organization to help minimize the problem
-Have worked to get into the habit of updating everything at once
-Have used comments to document struct members, but this just means results in duplication
-Do use IDE's auto-suggest to take a look and compare suggested entries to implemented code, but this doesn't detect changes.
I had thought that maybe structure definitions could appear multiple times as long as they were identical, but that doesn't compile. I believe duplicate structure names can appear as long as they do not share visibility.
The most effective thing I've come up with is to use a compile time assertion:
static_assert(sizeof(struct Foobar) == 128, "Foobar structure size changed, reevaluate construct and destroy functions");
It's pretty good, definitely good enough. I don't mind updating the constant when modifying the struct. Unfortunately compile time assertions are very platform (compiler) and C Standard dependent, and I'm trying to maintain the backwards compatibility and cross platform compatibility of my code.
This is a good link regarding C Compile Time Assertions:
http://www.pixelbeat.org/programming/gcc/static_assert.html
Edit:
I just had a thought; although a structure definition can't easily be relocated to a source file (unless it does not need to be shared with other source files), I believe a function can actually be relocated to a header file by inlining it.
That seems like a hacked way to make the language serve my unintended purpose, which is not what I want. I want to be professional. If the professional practice is not to approach this code-maintainability issue this way, then that is the answer.
I've been programming in C for almost 40 years, and I don't know of a good solution to this problem.
In some circles it's popular to use a set of carefully-contrived macro definitions so that you can write the structure once, not as a direct C struct declaration but as a sequence of these macros and then, by defining the macro differently and re-expanding, turn your "definition" into either a declaration or a definition or an initialization. Personally, I feel that these techniques are too obfuscatory and are more trouble than they're worth, but they can be used to decent effect.
Otherwise, the only solution -- though it's not what you're looking for -- is "Be careful."
In an ideal project (although I realize full well there's no such thing) you can define your data structures first, and then spend the rest of your time writing and debugging the code that uses them. If you never have occasion to add fields to structs, then obviously you won't have this problem. (I'm sorry if this sounds like a facetious or unhelpful comment, but I think it's part of the reason that I, just as #CoffeeTableEspresso mentioned in a comment, tend not to have too many problems like this in practice.)
It's perhaps worth noting that C++ has more or less the same problem. My biggest wishlist feature in C++ was always that it would be possible to initialize class members in the class declaration. (Actually, I think I've heard that a recent revision to the C++ standard does allow this -- in which case another not-necessarily-helpful answer to your question is "Use C++ instead".)
C doesn't let you have benign struct redefinitions but it does let you have benign macro redefinitions.
So as long as you
save the struct body in a macro (according to a fixed naming convention)
redefine the macro at the point of your constructor
you will get a warning if the struct body changes and you haven't updated the corresponding constructor.
Example:
header.h:
#define MC_foo_bod \
int x; \
double y; \
void *p
struct foo{ MC_foo_bod; };
foo__init.c
#include "header.h"
#ifdef MC_foo_bod
//try for a silent redefinition
//if it wasn't silent, the macro changed and so should this code
#define MC_foo_bod \
int x; \
double y; \
void *p
#else
#error ""
//oops--not a redefinition
//perhaps a typo in the macro name or a failure to include the header?
#endif
void foo__init(struct foo*X)
{
//...
}

Does using a prototype + definition instead of just a definition speed up the program?

Im very new to C programing, but have limited experience with Python, Java, and Perl. I was just wondering what the advantage is of having the prototype of a function above main() and the definition of that function below main() rather than just having the definition of said function above main(). From what I have read, the definition can act as a prototype as well.
Thanks in advance,
The use of a prototype above main() (within a single module) is mostly a matter of personal preference. Some people like to see main() at the bottom of the file; other people like to see it at the top. I had a professor in university who complained that I wrote my programs "upside-down" because I put main() at the bottom (and avoided having to write and maintain prototypes for everything).
There is one situation where a prototype may be required:
void b(); // prototype required here
void a()
{
b();
}
void b()
{
a();
}
In this mutually recursive situation, you need at least one of the prototypes to appear prior to the definition of the other function.
When someone else wants to use your library, you can give them the header (containing the prototypes), without giving them your code.
Example? Windows SDK.
If two functions call each other, you will need to have a prototype for at least one of them.
There is no performance impact* on the compiled code.
*Note: Well, there could be, I guess. If the compiler decides to put the method somewhere else in the binary, then you could theoretically see locality issues affecting the performance of your application. It's not something to normally worry about, though.
There's no advantage in splitting a function into a standalone prototype and a definition. In fact, there's a clear and explicit disadvantage to that approach, since it requires extra maintenance efforts to keep the prototype and the definition in perfect sync. For this reason, a reasonable approach would be to provide a separate prototype only when you have to. For example, with external functions declared in header files you have no other choice but to keep the prototype and definition separate.
As for functions internal to a single translation unit (the ones that we'd normally declare static) there's absolutely no reason to create a standalone prototype, since the definition of the function itself includes a prototype. Just define your functions in the natural order: low level functions first, high level functions last, and that's it. If your program consists of a single translation unit, the main function will end up at the very bottom.
The only case you'd have to declare a standalone prototype in this approach would be if some form of recursion was involved.
If you write your code in the correct order, i.e. main last, then there is only one definition in your code for each function. This prevents having to change the code in two, or more, places if changes ever need to be made. That prevents errors and reduces the work to maintain it. The principle is called "single source of truth"

C99 mixed declarations and code in open source projects?

Why is still C99 mixed declarations and code not used in open source C projects like the Linux kernel or GNOME?
I really like mixed declarations and code since it makes the code more readable and prevents hard to see bugs by restricting the scope of the variables to the narrowest possible. This is recommended by Google for C++.
For example, Linux requires at least GCC 3.2 and GCC 3.1 has support for C99 mixed declarations and code
You don't need mixed declaration and code to limit scope. You can do:
{
int c;
c = 1;
{
int d = c + 1;
}
}
in C89. As for why these projects haven't used mixed declarations (assuming this is true), it's most likely a case of "If it ain't broke don't fix it."
This is an old question but I'm going to suggest that inertia is the reason that most of these projects still use ANSI C declarations rules.
However there are a number of other possibilities, ranging from valid to ridiculous:
Portability. Many open source projects work under the assumption that pedantic ANSI C is the most portable way to write software.
Age. Many of these projects predate the C99 spec and the authors may prefer a consistent coding style.
Ignorance. The programmers submitting predate C99 and are unaware of the benefits of mixed declarations and code. (Alternate interpretation: Developers are fully aware of the potential tradeoffs and decide that mixed declarations and statements are not worth the effort. I highly disagree, but it's rare that two programmers will agree on anything.)
FUD. Programmers view mixed declarations and code as a "C++ism" and dislike it for that reason.
There is little reason to rewrite the Linux kernel to make cosmetic changes that offer no performance gains.
If the code base is working, so why change it for cosmetic reasons?
There is no benefit. Declaring all variables at the beginning of the function (pascal like) is much more clear, in C89 you can also declare variables at the beginning of each scope (inside loops example) which is both practical and concise.
I don't remember any interdictions against this in the style guide for kernel code. However, it does say that functions should be as small as possible, and only do one thing. This would explain why a mixture of declarations and code is rare.
In a small function, declaring variables at the start of scope acts as a sort of Introit, telling you something about what's coming soon after. In this case the movement of the variable declaration is so limited that it would likely either have no effect, or serve to hide some information about the functionality by pushing the barker into the crowd, so to speak. There is a reason that the arrival of a king was declared before he entered a room.
OTOH, a function which must mix variables and code to be readable is probably too big. This is one of the signs (along with too-nested blocks, inline comments and other things) that some sections of a function need to be abstracted into separate functions (and declared static, so the optimizer can inline them).
Another reason to keep declarations at the beginning of the functions: should you need to reorder the execution of statements in the code, you may move a variable out of its scope without realizing it, since the scope of a variable declared in the middle of code is not evident in the indentation (unless you use a block to show the scope). This is easily fixed, so it's just an annoyance, but new code often undergoes this kind of transformation, and annoyance can be cumulative.
And another reason: you might be tempted to declare a variable to take the error return code from a function, like so:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
Perfectly reasonable thing to do. But:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
....
int ret = another_func_may_fail();
if (ret) { handle_other_fail(ret); }
Ooops! ret is defined twice. "So? Remove the second declaration." you say. But this makes the code asymmetric, and you end up with more refactoring limitations.
Of course, I mix declarations and code myself; no reason to be dogmatic about it (or else your karma may run over your dogma :-). But you should know what the concomitant problems are.
Maybe it's not needed, maybe the separation is good? I do it in C++, which has this feature as well.
There is no reason to change the code away like this, and C99 is still not widely supported by compilers. It is mostly about portability.

When is "inline" ineffective? (in C)

Some people love using inline keyword in C, and put big functions in headers. When do you consider this to be ineffective? I consider it sometime even annoying, because it is unusual.
My principle is that inline should be used for small functions accessed very frequently, or in order to have real type checking. Anyhow, my taste guide me, but I am not sure how to explain best the reasons why inline is not so useful for big functions.
In this question people suggest that the compiler can do a better job at guessing the right thing to do. That was also my assumption. When I try to use this argument, people reply it does not work with functions coming from different objects. Well, I don't know (for example, using GCC).
Thanks for your answers!
inline does two things:
gives you an exemption from the "one definition rule" (see below). This always applies.
Gives the compiler a hint to avoid a function call. The compiler is free to ignore this.
#1 Can be very useful (e.g. put definition in header if short) even if #2 is disabled.
In practice compilers often do a better job of working out what to inline themselves (especially if profile guided optimisation is available).
[EDIT: Full References and relevant text]
The two points above both follow from the ISO/ANSI standard (ISO/IEC 9899:1999(E), commonly known as "C99").
In §6.9 "External Definition", paragraph 5:
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
While the equalivalent definition in C++ is explictly named the One Definition Rule (ODR) it serves the same purpose. Externals (i.e. not "static", and thus local to a single Translation Unit -- typically a single source file) can only be defined once only unless it is a function and inline.
In §6.7.4, "Function Specifiers", the inline keyword is defined:
Making a function an inline function suggests that calls to the function be as
fast as possible.[118] The extent to which such suggestions are effective is
implementation-defined.
And footnote (non-normative), but provides clarification:
By using, for example, an alternative to the usual function call mechanism, such as ‘‘inline substitution’’. Inline substitution is not textual substitution, nor does it create a new function. Therefore, for example, the expansion of a macro used within the body of the function uses the definition it had at the point the function body appears, and not where the function is called; and identifiers refer to the declarations in scope where the body occurs. Likewise, the function has a single address, regardless of the number of inline definitions that occur in addition to the external definition.
Summary: what most users of C and C++ expect from inline is not what they get. Its apparent primary purpose, to avoid functional call overhead, is completely optional. But to allow separate compilation, a relaxation of single definition is required.
(All emphasis in the quotes from the standard.)
EDIT 2: A few notes:
There are various restrictions on external inline functions. You cannot have a static variable in the function, and you cannot reference static TU scope objects/functions.
Just seen this on VC++'s "whole program optimisation", which is an example of a compiler doing its own inline thing, rather than the author.
The important thing about an inline declaration is that it doesn't necessarily do anything. A compiler is free to decide to, in many cases, to inline a function not declared so, and to link functions which are declared inline.
Another reason why you shouldn't use inline for large functions, is in the case of libraries. Every time you change the inline functions, you might loose ABI compatibility because the application compiled against an older header, has still inlined the old version of the function. If inline functions are used as a typesafe macro, chances are great that the function never needs to be changed in the life cycle of the library. But for big functions this is hard to guarantee.
Of course, this argument only applies if the function is part of your public API.
An example to illustrate the benefits of inline. sinCos.h :
int16 sinLUT[ TWO_PI ];
static inline int16_t cos_LUT( int16_t x ) {
return sin_LUT( x + PI_OVER_TWO )
}
static inline int16_t sin_LUT( int16_t x ) {
return sinLUT[(uint16_t)x];
}
When doing some heavy number crunching and you want to avoid wasting cycles on computing sin/cos you replace sin/cos with a LUT.
When you compile without inline the compiler will not optimize the loop and the output .asm will show something along the lines of :
;*----------------------------------------------------------------------------*
;* SOFTWARE PIPELINE INFORMATION
;* Disqualified loop: Loop contains a call
;*----------------------------------------------------------------------------*
When you compile with inline the compiler has knowledge about what happens in the loop and will optimize because it knows exactly what is happening.
The output .asm will have an optimized "pipelined" loop ( i.e. it will try to fully utilize all the processor's ALUs and try to keep the processor's pipeline full without NOPS).
In this specific case, I was able to increase my performance by about 2X or 4X which got me within what I needed for my real time deadline.
p.s. I was working on a fixed point processor... and any floating point operations like sin/cos killed my performance.
Inline is ineffective when you use the pointer to function.
Inline is effective in one case: when you've got a performance problem, ran your profiler with real data, and found the function call overhead for some small functions to be significant.
Outside of that, I can't imagine why you'd use it.
That's right. Using inline for big functions increases compile time, and brings little extra performance to the application. Inline functions are used to tell the compiler that a function is to be included without a call, and such should be small code repeated many times. In other words: for big functions, the cost of making the call compared to the cost of the own function implementation is negligible.
I mainly use inline functions as typesafe macros. There's been talk about adding support for link-time optimizations to GCC for quite some time, especially since LLVM came along. I don't know how much of it actually has been implemented yet, though.
Personally I don't think you should ever inline, unless you have first run a profiler on your code and have proven that there is a significant bottleneck on that routine that can be partially alleviated by inlining.
This is yet another case of the Premature Optimization Knuth warned about.
Inline can be used for small and frequently used functions such as getter or setter method. For big functions it is not advisable to use inline as it increases the exe size.
Also for recursive functions, even if you make inline, the compiler will ignore it.
inline acts as a hint only.
Added only very recently. So works with only the latest standard compliant compilers.
Inline functions should be approximately 10 lines or less, give or take, depending on your compiler of choice.
You can tell your compiler that you want something inlined .. its up to the compiler to do so. There is no -force-inline option that I know of which the compiler can't ignore. That is why you should look at the assembler output and see if your compiler actually did inline the function, if not, why not? Many compilers just silently say 'screw you!' in that respect.
so if:
static inline unsigned int foo(const char *bar)
.. does not improve things over static int foo() its time to revisit your optimizations (and likely loops) or argue with your compiler. Take special care to argue with your compiler first, not the people who develop it.. or your just in store for lots of unpleasant reading when you open your inbox the next day.
Meanwhile, when making something (or attempting to make something) inline, does doing so actually justify the bloat? Do you really want that function expanded every time its called? Is the jump so costly?, your compiler is usually correct 9/10 times, check the intermediate output (or asm dumps).

Resources