For me it's a rule to define and declare static functions inside source files, I mean .c files.
However in very rare situations I saw people declaring it in the header file.
Since static functions have internal linkage we need to define it in every file we include the header file where the function is declared. This looks pretty odd and far from what we usually want when declaring something as static.
On the other hand if someone naive tries to use that function without defining it the compiler will complaint. So in some sense is not really unsafe to do this even sounding strange.
My questions are:
What is the problem of declaring static functions in header files?
What are the risks?
What the impact in compilation time?
Is there any risk in runtime?
First I'd like to clarify my understanding of the situation you describe: The header contains (only) a static function declaration while the C file contains the definition, i.e. the function's source code. For example
some.h:
static void f();
// potentially more declarations
some.c:
#include "some.h"
static void f() { printf("Hello world\n"); }
// more code, some of it potentially using f()
If this is the situation you describe, I take issue with your remark
Since static functions have internal linkage we need to define it in every file we include the header file where the function is declared.
If you declare the function but do not use it in a given translation unit, I don't think you have to define it. gcc accepts that with a warning; the standard does not seem to forbid it, unless I missed something. This may be important in your scenario because translation units which do not use the function but include the header with its declaration don't have to provide an unused definition.
Now let's examine the questions:
What is the problem of declaring static functions in header files?
It is somewhat unusual. Typically, static functions are functions needed in only one file. They are declared static to make that explicit by limiting their visibility. Declaring them in a header therefore is somewhat antithetical. If the function is indeed used in multiple files with identical definitions it should be made external, with a single definition. If only one translation unit actually uses it, the declaration does not belong in a header.
One possible scenario therefore is to ensure a uniform function signature for different implementations in the respective translation units. The common header leads to a compile time error for different return types in C (and C++); different parameter types would cause a compile time error only in C (but not in C++' because of function overloading).
What are the risks?
I do not see risks in your scenario. (As opposed to also including the function definition in a header which may violate the encapsulation principle.)
What the impact in compilation time?
A function declaration is small and its complexity is low, so the overhead of having additional function declarations in a header is likely negligible. But if you create and include an additional header for the declaration in many translation units the file handling overhead can be significant (i.e. the compiler idles a lot while it waits for the header I/O)
Is there any risk in runtime? I cannot see any.
This is not an answer to the stated questions, but hopefully shows why one might implement a static (or static inline) function in a header file.
I can personally only think of two good reasons to declare some functions static in a header file:
If the header file completely implements an interface that should only be visible in the current compilation unit
This is extremely rare, but might be useful in e.g. an educational context, at some point during the development of some example library; or perhaps when interfacing to another programming language with minimal code.
A developer might choose to do so if the library or interaface implementation is trivial and nearly so, and ease of use (to the developer using the header file) is more important than code size. In these cases, the declarations in the header file often use preprocessor macros, allowing the same header file to be included more than once, providing some sort of crude polymorphism in C.
Here is a practical example: Shoot-yourself-in-the-foot playground for linear congruential pseudorandom number generators. Because the implementation is local to the compilation unit, each compilation unit will get their own copies of the PRNG. This example also shows how crude polymorphism can be implemented in C.
prng32.h:
#if defined(PRNG_NAME) && defined(PRNG_MULTIPLIER) && defined(PRNG_CONSTANT) && defined(PRNG_MODULUS)
#define MERGE3_(a,b,c) a ## b ## c
#define MERGE3(a,b,c) MERGE3_(a,b,c)
#define NAME(name) MERGE3(PRNG_NAME, _, name)
static uint32_t NAME(state) = 0U;
static uint32_t NAME(next)(void)
{
NAME(state) = ((uint64_t)PRNG_MULTIPLIER * (uint64_t)NAME(state) + (uint64_t)PRNG_CONSTANT) % (uint64_t)PRNG_MODULUS;
return NAME(state);
}
#undef NAME
#undef MERGE3
#endif
#undef PRNG_NAME
#undef PRNG_MULTIPLIER
#undef PRNG_CONSTANT
#undef PRNG_MODULUS
An example using the above, example-prng32.h:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#define PRNG_NAME glibc
#define PRNG_MULTIPLIER 1103515245UL
#define PRNG_CONSTANT 12345UL
#define PRNG_MODULUS 2147483647UL
#include "prng32.h"
/* provides glibc_state and glibc_next() */
#define PRNG_NAME borland
#define PRNG_MULTIPLIER 22695477UL
#define PRNG_CONSTANT 1UL
#define PRNG_MODULUS 2147483647UL
#include "prng32.h"
/* provides borland_state and borland_next() */
int main(void)
{
int i;
glibc_state = 1U;
printf("glibc lcg: Seed %u\n", (unsigned int)glibc_state);
for (i = 0; i < 10; i++)
printf("%u, ", (unsigned int)glibc_next());
printf("%u\n", (unsigned int)glibc_next());
borland_state = 1U;
printf("Borland lcg: Seed %u\n", (unsigned int)borland_state);
for (i = 0; i < 10; i++)
printf("%u, ", (unsigned int)borland_next());
printf("%u\n", (unsigned int)borland_next());
return EXIT_SUCCESS;
}
The reason for marking both the _state variable and the _next() function static is that this way each compilation unit that includes the header file has their own copy of the variables and the functions -- here, their own copy of the PRNG. Each must be separately seeded, of course; and if seeded to the same value, will yield the same sequence.
One should generally shy away from such polymorphism attempts in C, because it leads to complicated preprocessor macro shenanigans, making the implementation much harder to understand, maintain, and modify than necessary.
However, when exploring the parameter space of some algorithm -- like here, the types of 32-bit linear congruential generators, this lets us use a single implementation for each of the generators we examine, ensuring there are no implementation differences between them. Note that even this case is more like a development tool, and not something you ought to see in a implementation provided for others to use.
If the header implements simple static inline accessor functions
Preprocessor macros are commonly used to simplify code accessing complicated structure types. static inline functions are similar, except that they also provide type checking at compile time, and can refer to their parameters several times (with macros, that is problematic).
One practical use case is a simple interface for reading files using low-level POSIX.1 I/O (using <unistd.h> and <fcntl.h> instead of <stdio.h>). I've done this myself when reading very large (dozens of megabytes to gigabytes range) text files containing real numbers (with a custom float/double parser), as the GNU C standard I/O is not particularly fast.
For example, inbuffer.h:
#ifndef INBUFFER_H
#define INBUFFER_H
typedef struct {
unsigned char *head; /* Next buffered byte */
unsigned char *tail; /* Next byte to be buffered */
unsigned char *ends; /* data + size */
unsigned char *data;
size_t size;
int descriptor;
unsigned int status; /* Bit mask */
} inbuffer;
#define INBUFFER_INIT { NULL, NULL, NULL, NULL, 0, -1, 0 }
int inbuffer_open(inbuffer *, const char *);
int inbuffer_close(inbuffer *);
int inbuffer_skip_slow(inbuffer *, const size_t);
int inbuffer_getc_slow(inbuffer *);
static inline int inbuffer_skip(inbuffer *ib, const size_t n)
{
if (ib->head + n <= ib->tail) {
ib->head += n;
return 0;
} else
return inbuffer_skip_slow(ib, n);
}
static inline int inbuffer_getc(inbuffer *ib)
{
if (ib->head < ib->tail)
return *(ib->head++);
else
return inbuffer_getc_slow(ib);
}
#endif /* INBUFFER_H */
Note that the above inbuffer_skip() and inbuffer_getc() do not check if ib is non-NULL; this is typical for such functions. These accessor functions are assumed to be "in the fast path", i.e. called very often. In such cases, even the function call overhead matters (and is avoided with static inline functions, since they are duplicated in the code at the call site).
Trivial accessor functions, like the above inbuffer_skip() and inbuffer_getc(), may also let the compiler avoid the register moves involved in function calls, because functions expect their parameters to be located in specific registers or on the stack, whereas inlined functions can be adapted (wrt. register use) to the code surrounding the inlined function.
Personally, I do recommend writing a couple of test programs using the non-inlined functions first, and compare the performance and results to the inlined versions. Comparing the results ensure the inlined versions do not have bugs (off by one type is common here!), and comparing the performance and generated binaries (size, at least) tells you whether inlining is worth it in general.
Why would you want a both global and static function? In c, functions are global by default. You only use static functions if you want to limit the access to a function to the file they are declared. So you actively restrict access by declaring it static...
The only requirement for implementations in the header file, is for c++ template functions and template class member functions.
Related
In my understanding, INLINE can speed up code execution, is it?
How much speed can we gain from it?
Ripped from here:
Yes and no. Sometimes. Maybe.
There are no simple answers. inline functions might make the code faster, they might make it slower. They might make the executable larger, they might make it smaller. They might cause thrashing, they might prevent thrashing. And they might be, and often are, totally irrelevant to speed.
inline functions might make it faster: As shown above, procedural integration might remove a bunch of unnecessary instructions, which might make things run faster.
inline functions might make it slower: Too much inlining might cause code bloat, which might cause "thrashing" on demand-paged virtual-memory systems. In other words, if the executable size is too big, the system might spend most of its time going out to disk to fetch the next chunk of code.
inline functions might make it larger: This is the notion of code bloat, as described above. For example, if a system has 100 inline functions each of which expands to 100 bytes of executable code and is called in 100 places, that's an increase of 1MB. Is that 1MB going to cause problems? Who knows, but it is possible that that last 1MB could cause the system to "thrash," and that could slow things down.
inline functions might make it smaller: The compiler often generates more code to push/pop registers/parameters than it would by inline-expanding the function's body. This happens with very small functions, and it also happens with large functions when the optimizer is able to remove a lot of redundant code through procedural integration — that is, when the optimizer is able to make the large function small.
inline functions might cause thrashing: Inlining might increase the size of the binary executable, and that might cause thrashing.
inline functions might prevent thrashing: The working set size (number of pages that need to be in memory at once) might go down even if the executable size goes up. When f() calls g(), the code is often on two distinct pages; when the compiler procedurally integrates the code of g() into f(), the code is often on the same page.
inline functions might increase the number of cache misses: Inlining might cause an inner loop to span across multiple lines of the memory cache, and that might cause thrashing of the memory-cache.
inline functions might decrease the number of cache misses: Inlining usually improves locality of reference within the binary code, which might decrease the number of cache lines needed to store the code of an inner loop. This ultimately could cause a CPU-bound application to run faster.
inline functions might be irrelevant to speed: Most systems are not CPU-bound. Most systems are I/O-bound, database-bound or network-bound, meaning the bottleneck in the system's overall performance is the file system, the database or the network. Unless your "CPU meter" is pegged at 100%, inline functions probably won't make your system faster. (Even in CPU-bound systems, inline will help only when used within the bottleneck itself, and the bottleneck is typically in only a small percentage of the code.)
There are no simple answers: You have to play with it to see what is best. Do not settle for simplistic answers like, "Never use inline functions" or "Always use inline functions" or "Use inline functions if and only if the function is less than N lines of code." These one-size-fits-all rules may be easy to write down, but they will produce sub-optimal results.
Copyright (C) Marshall Cline
Using inline makes the system use the substitution model of evaluation, but this is not guaranteed to be used all the time. If this is used, the generated code will be longer and may be faster, but if some optimizations are active, the sustitution model is not faster not all the time.
The reason I use inline function specifier (specifically, static inline), is not because of "speed", but because
static part tells the compiler the function is only visible in the current translation unit (the current file being compiled and included header files)
inline part tells the compiler it can include the implementation of the function at the call site, if it wants to
static inline tells the compiler that it can skip the function completely if it is not used at all in the current translation unit
(Specifically, the compiler that I use most with the options I use most, gcc -Wall, does issue a warning if a function marked static is unused; but will not issue a warning if a function marked static inline is unused.)
static inline tells us humans that the function is a macro-like helper function, in addition adding type-checker to the same behavior as macro's.
Thus, in my opinion, the assumption that inline has anything to do with speed per se, is incorrect. Answering the stated question with a straight answer would be misleading.
In my code, you see them associated with some data structures, or occasionally global variables.
A typical example is when I want to implement a Xorshift pseudorandom number generator in my own C code:
#include <inttypes.h>
static uint64_t prng_state = 1; /* Any nonzero uint64_t seed is okay */
static inline uint64_t prng_u64(void)
{
uint64_t state;
state = prng_state;
state ^= state >> 12;
state ^= state << 25;
state ^= state >> 27;
prng_state = state;
return state * UINT64_C(2685821657736338717);
}
The static uint64_t prng_state = 1; means that prng_state is a variable of type uint64_t, visible only in the current compilation unit, and initialized to 1. The prng_u64() function returns an unsigned 64-bit pseudorandom integer. However, if you do not use prng_u64(), the compiler will not generate code for it either.
Another typical use case is when I have data structures, and they need accessor functions. For example,
#ifndef GRID_H
#define GRID_H
#include <stdlib.h>
typedef struct {
int rows;
int cols;
unsigned char *cell;
} grid;
#define GRID_INIT { 0, 0, NULL }
#define GRID_OUTSIDE -1
static inline int grid_get(grid *const g, const int row, const int col)
{
if (!g || row < 0 || col < 0 || row >= g->rows || col >= g->cols)
return GRID_OUTSIDE;
return g->cell[row * (size_t)(g->cols) + col];
}
static inline int grid_set(grid *const g, const int row, const int col,
const unsigned char value)
{
if (!g || row < 0 || col < 0 || row >= g->rows || col >= g->cols)
return GRID_OUTSIDE;
return g->cell[row * (size_t)(g->cols) + col] = value;
}
static inline void grid_init(grid *g)
{
g->rows = 0;
g->cols = 0;
g->cell = NULL;
}
static inline void grid_free(grid *g)
{
free(g->cell);
g->rows = 0;
g->cols = 0;
g->cell = NULL;
}
int grid_create(grid *g, const int rows, const int cols,
const unsigned char initial_value);
int grid_load(grid *g, FILE *handle);
int grid_save(grid *g, FILE *handle);
#endif /* GRID_H */
That header file defines some useful helper functions, and declares the functions grid_create(), grid_load(), and grid_save(), that would be implemented in a separate .c file.
(Yes, those three functions could be implemented in the header file just as well, but it would make the header file quite large. If you had a large project, spread over many translation units (.c source files), each one including the header file would get their own local copies of the functions. The accessor functions defined as static inline above are short and trivial, so it is perfectly okay for them to be copied here and there. The three functions I omitted are much larger.)
I am trying to figure out to split a program into multiple files to have 1 .h file and 2 .c files. I have a full program but just for this example, I would like the function of printing of the output to be in a separate .c file. I know how to make a function of basic arithmetic like:
int Sum(int a, int b)
{
return a+b;
}
but how would I make a function with a for loop and just the code below?
for (i = 0; i < count; i++)
{
printf("\n%s", ptr[i]->name);
printf("%s", ptr[i]->street);
printf("%s", ptr[i]->citystate);
printf("%s", ptr[i]->zip);
free(ptr[i]);
}
I get the way it works is like this, just don't know how to make a for loop into a function.
Functions.h:
#ifndef FUNCTIONS_H_INCLUDED
#define FUNCTIONS_H_INCLUDED
/* ^^ these are the include guards */
/* Prototypes for the functions */
/* Sums two ints */
int Sum(int a, int b);
#endif
Functions.c:
/* In general it's good to include also the header of the current .c,
to avoid repeating the prototypes */
#include "Functions.h"
int Sum(int a, int b)
{
return a+b;
}
Main.c
#include "stdio.h"
/* To use the functions defined in Functions.c I need to #include Functions.h */
#include "Functions.h"
int main(void)
{
int a, b;
printf("Insert two numbers: ");
if(scanf("%d %d", &a, &b)!=2)
{
fputs("Invalid input", stderr);
return 1;
}
printf("%d + %d = %d", a, b, Sum(a, b));
return 0;
}
At the simplest, you'd convert the loop fragment into:
void print_and_destroy(size_t count, SomeType ptr[count])
{
for (size_t i = 0; i < count; i++)
{
printf("\n%s", ptr[i]->name);
printf("%s", ptr[i]->street);
printf("%s", ptr[i]->citystate);
printf("%s", ptr[i]->zip);
free(ptr[i]);
}
free(ptr);
}
The final free is there because the array is now contains no useful pointers (they've all been freed). You could add ptr[i] = NULL; after the free(ptr[i]); instead. That indicates that there is no data there any more.
However, as noted in the comments 1 and 2 by SergeyA, and the clumsy but accurate function name, this isn't a good breakdown of the code. You need two functions:
void destroy(size_t count, SomeType ptr[count])
{
for (size_t i = 0; i < count; i++)
{
free(ptr[i]);
ptr[i] = NULL; // Option 1
}
free(ptr); // Option 2
}
You would not use option 1 if you use option 2 (though it would do no actual harm). If you do not use option 2, you should use option 1.
void print(size_t count, const SomeType ptr[count])
{
for (size_t i = 0; i < count; i++)
{
printf("%s, ", ptr[i]->name);
printf("%s, ", ptr[i]->street);
printf("%s, ", ptr[i]->citystate);
printf("%s\n", ptr[i]->zip);
}
}
Note that you probably need space between the fields of the address. You might or might not want one line for name, one line for street address, and one line for city, state and zip code — choose your format to suit. Generally, output newlines at the end of output, not at the beginning unless you want double spacing.
So that would be my separate function.c file and my header file would look something like …code omitted… right? How would the function call in the main program look like?
The outline of the header would be:
#ifndef HEADER_H_INCLUDED
#define HEADER_H_INCLUDED
#include <stddef.h> // Smallest header that defines size_t
typedef struct SomeType
{
char name[32];
char street[32];
char citystate[32];
char zip[11]; // Allow for ZIP+4 and terminal null
} SomeType;
extern void print_and_destroy(size_t count, SomeType ptr[count]);
#endif /* HEADER_H_INCLUDED */
Note that the header doesn't include <stdio.h>; no part of the interface depends on anything that's specific to <stdio.h>, such as a FILE * (though <stdio.h> is one of the headers that defines size_t). A header should be minimal, but self-contained and idempotent. Not including <stdio.h> is part of being minimal; including <stddef.h> is part of being self-contained; and the header guards are the key part of being idempotent. It means you can include the header and not have to worry about whether it has already been included before indirectly or is included again later, indirectly, and you don't have to worry about what other headers have to be included — the header is self-contained and deals with that.
In your main(), you'd have something like:
enum { MAX_ADDRESSES = 20 };
SomeType *data[MAX_ADDRESSES];
…memory allocation…
…data loading…
print_and_destroy(MAX_ADDRESSES, data);
There are two different issues involved (I recommend to address the first one, if so needed, before the second one):
how to split a monolithic translation unit in several ones, but keeping the same functions
how to refactor a code to make it more readable and made of "smaller" and "better" functions. In your case, this is the main issue.
The first question, for example splitting a small single program in a single myprog.c file of a few dozen thousands of lines, is quite easy. The real issue is to organize that cleverly (and then it becomes harder, and opinion based). You just need to put mostly declarations in your header file, and to put definitions in several translation units, and of course to improve your build process to use and link them together. So you would have first a single common header file myheader.h declaring your types, macros, functions, global variables. You would also define some short static inline functions there, if you need them. Then you would have several C files (technically translation units) foo.c, bar.c, dingo.c, and you'll better put several related functions in each of them. Each such C file has #include "myheader.h". You'll better use some build automation tool, probably GNU make (or something else, e.g. ninja) that you would configure with your Makefile. You could later have several header files, but for a small project of only several dozen thousands of source code lines that might be not needed. In some cases, you would generate some C file from a higher-level description (e.g. use simple metaprogramming techniques).
The second question (code refactoring) is really difficult, and has no simple universal answer. It really depends of the project. A simple (and very debatable, and over-simplifying) rule of thumb is that you need to have functions "doing only one thing" and of at most a few dozen lines each. So as soon as a function does more than one thing or has more than one or two dozen lines you should consider splitting and refactoring it (but you won't always do that). Obviously your main should be split in several stuff.
At last, don't fail into the excessive habit of putting only one function per *.c file, or have lots of small *.c files of only a hundred lines each. This is generally useless, and could increase your build time (because the preprocessor would work a lot), and perhaps even slightly decrease the runtime performance of your executable (because your optimizing compiler won't be able to inline, unless you use link-time optimization). My recommendation (opinion-based, so debatable) is to have source files of several thousand lines each containing several (dozen of) functions.
In your case, I believe your program (in your question) is so tiny that you don't need to split it into several translation units (unless your teacher asks you to). But indeed you need to refactor it, perhaps defining some abstract data type and routines supporting it (see this).
Study the source code of existing free software (e.g. on github) related to your project for inspiration, since you'll need to define and follow many coding conventions (and coding rules) which matter a lot with C programming. In your newbie case, I believe that studying the source of any small free software program (of a few dozen thousand lines, see this) -in a domain you are understanding or interested in, and in the programming language you are practicing- will profit you a lot.
Question
I try to find static (compile time) asserts, to ensure (as good as possible) things below. As I use them in an auto code generation context (see “Background” below) they do not have to be neat, the only have to break compilation, at best with zero overhead. Elegant variants are welcomed though.
The following things shall be checked:
A Type Identity
typedef T T1;
typedef T T2;
typedef X T3;
T1 a;
T2 b;
T3 c;
SA_M1(T1,T2); /* compilation */
SA_M1(T1,T3); /* compilation error */
SA_M2(a,b); /* compilation */
SA_M2(a,c); /* compilation error */
where X and T are C Types (including structured, aggregated, object pointer, not so important function pointer). Note again, that a set of partly successful solutions also helps.
Some solutions that I assume will partly work:
comparing the sizes
checking if the type is a pointer as claimed by trying to dereference it.
for unsigned integers: Compare a casted slightly to big value with the expected wrap around value.
for floats, compare double precision exact representable value with the casted one (hoping the best for platform specific rounding operations)
B A Variable has global Scope
My solution here is at the momement simply to generate a static function, that tries to get a reference to the global variable Assume that X is a global variable:
static void SA_IsGlobal_X() {(void) (&X == NULL); /* Dummy Operation */}
C A Function has the correct number of parameters
I have no idea yet.
D If the prototype of a functions is as it is expected
I have no idea yet.
E If a function or macro parameters are compile time constants (
This question is discussed here for macros:
Macro for use in expression while enforcing its arguments to be compile time constants
For functions, an wrapper macro could do.
Z Other things you might like to check considering the “background” part below
Preferred are answers that can be done with C89, have zero costs in runtime, stack and (with most compilers) code size. As the checks will be auto generated, readability is not so important, but I like to place the checks in static functions, whenever possible.
Background:
I want to provide C functions as well as an interface generator to allow them to smoothly being integrated in different C frameworks (with C++ on the horizon). The user of the interface generator then only specifies where the inputs come from, and which of the outputs shall go where. Options are at least:
RAW (as it is implemented - and should be used)
from the interface functions parameter, which is of a type said to be the same as my input/output (and perhaps is a field of a structure or an array element)
from a getter/setter function
from a global variable
using a compile time constant
I will:
ask for a very detailed interface specification (including specification errors)
use parsers to check typedefs and declarations (including tool bugs and my tool usage errors)
But this happens at generation time. Besides everything else: if the user change either the environment or takes a new major version of my function (this can be solved by macros checking versions), without running the interface generator again, I would like to have a last defense line at compile time.
The resulting code of the generations might near worst case be something like:
#include "IFMyFunc.h" /* contains all user headers for the target framework(s) */
#include "MyFunc.h"
RetType IFMYFunc(const T1 a, const struct T2 * const s, T3 * const c)
{
/* CHECK INTERFACE */
CheckIFMyFunc();
/* get d over a worst case parametrized getter function */
const MyD_type d = getD(s->dInfo);
/* do horrible call by value and reference stuff, f and g are global vars */
c.c1 = MyFunc(a,s->b,c.c1,d,f,&(c->c2), &e,&g);
set(e);
/* return something by return value */
return e;
}
(I am pretty sure I will restrict the combos though).
static void CheckIFMyFunc(void)
{
/* many many compile time checks of types and specifications */
}
or I will provide a piece of code (local block) to be directly infused - which is horrible architecture, but might be necessary if we can't abandon some of the frame work fast enough, supported by some legacy scripts.
for A, would propose:
#define SA_M1(A, B) \
do { \
A ___a; \
B ___b = ___a; \
(void)___b; \
} while (0)
for D (and I would say that C is already done by D)
typedef int (*myproto)(int a, char **c);
#define FN_SA(Ref, Challenger) \
do { \
Ref ___f = Challenger; \
(void) ___f; \
} while (0)
void test(int argc, char **argv);
int main(int argc, char **argv)
{
FN_SA(myproto, main);
FN_SA(myproto, test); /* Does not compile */
return 0;
}
Nevertheless, there are some remaining problems with void *:
any pointer may be casted to/from void * in C, which will probably make the solution for A fail in some cases....
BTW, if you plan to use C++ in the meanterm, you could just use C++ templates and so on to have this stests done here. Would be far more clean and reliable IMHO.
In order to have a clean code, using some OO concept can be useful, even in C.
I often write modules made of a pair of .h and .c files. The problem is that the user of the module have to be careful, since private members don't exist in C. The use of the pimpl idiom or abstract data types is ok, but it adds some code and/or files, and requires a heavier code. I hate using accessor when I don't need one.
Here is a idea which provides a way to make the compiler complain about invalid access to "private" members, with only a few extra code. The idea is to define twice the same structure, but with some extra 'const' added for the user of the module.
Of course, writing in "private" members is still possible with a cast. But the point is only to avoid mistakes from the user of the module, not to safely protect memory.
/*** 2DPoint.h module interface ***/
#ifndef H_2D_POINT
#define H_2D_POINT
/* 2D_POINT_IMPL need to be defined in implementation files before #include */
#ifdef 2D_POINT_IMPL
#define _cst_
#else
#define _cst_ const
#endif
typedef struct 2DPoint
{
/* public members: read and write for user */
int x;
/* private members: read only for user */
_cst_ int y;
} 2DPoint;
2DPoint *new_2dPoint(void);
void delete_2dPoint(2DPoint **pt);
void set_y(2DPoint *pt, int newVal);
/*** 2dPoint.c module implementation ***/
#define 2D_POINT_IMPL
#include "2dPoint.h"
#include <stdlib.h>
#include <string.h>
2DPoint *new_2dPoint(void)
{
2DPoint *pt = malloc(sizeof(2DPoint));
pt->x = 42;
pt->y = 666;
return pt;
}
void delete_2dPoint(2DPoint **pt)
{
free(*pt);
*pt = NULL;
}
void set_y(2DPoint *pt, int newVal)
{
pt->y = newVal;
}
#endif /* H_2D_POINT */
/*** main.c user's file ***/
#include "2dPoint.h"
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
2DPoint *pt = new_2dPoint();
pt->x = 10; /* ok */
pt->y = 20; /* Invalid access, y is "private" */
set_y(pt, 30); /* accessor needed */
printf("pt.x = %d, pt.y = %d\n", pt->x, pt->y); /* no accessor needed for reading "private" members */
delete_2dPoint(&pt);
return EXIT_SUCCESS;
}
And now, here is the question: is this trick OK with the C standard?
It works fine with GCC, and the compiler doesn't complain about anything, even with some strict flags, but how can I be sure that this is really OK?
This is almost certainly undefined behavior.
Writing/modifying an object declared as const is prohibited and doing so results in UB. Furthermore, the approach you take re-declares struct 2DPoint as two technically different types, which is also not permitted.
Note that this (as undefined behavior in general) does not mean that it "certainly won't work" or "it must crash". In fact, I find it quite logical that it works, because if one reads the source intelligently, he may easily find out what the purpose of it is and why it migh be regarded as correct. However, the compiler is not intelligent - at best, it's a finite automaton which has no knowledge about what the code is supposed to do; it only obeys (more or less) to the syntactical and semantical rules of the grammar.
This violates C 2011 6.2.7 1.
6.2.7 1 requires that two definitions of the same structure in different translation units have compatible type. It is not permitted to have const in one and not the other.
In one module, you may have a reference to one of these objects, and the members appear to be const to the compiler. When the compiler writes calls to functions in other modules, it may hold values from the const members in registers or other cache or in partially or fully evaluated expressions from later in the source code than the function call. Then, when the function modifies the member and returns, the original module will not have the changed value. Worse, it may use some combination of the changed value and the old value.
This is highly improper programming.
In Bjarne Stroustrup's words: C is not designed to support OOP, although it enables OOP, which means it is possible to write OOP programs in C, but only very hard to do so. As such, if you have to write OOP code in C, there seems nothing wrong with using this approach, but it is preferable to use a language better suited for the purpose.
By trying to write OOP code in C, you have already entered a territory where "common sense" has to be overridden, so this approach is fine as long as you take responsibility to use it properly. You also need to ensure that it is thoroughly and rigourously documented and everyone concerned with the code is aware of it.
Edit Oh, you may have to use a cast to get around the const. I fail to recall if the C-style cast can be used like C++ const_cast.
You can use different approach - declare two structs, one for user without private members (in header) and one with private members for internal use in your implementation unit. All private members should be placed after public ones.
You always pass around the pointer to the struct and cast it to internal-use when needed, like this:
/* user code */
struct foo {
int public;
};
int bar(void) {
struct foo *foo = new_foo();
foo->public = 10;
}
/* implementation */
struct foo_internal {
int public;
int private;
};
struct foo *new_foo(void) {
struct foo_internal *foo == malloc(sizeof(*foo));
foo->public = 1;
foo->private = 2;
return (struct foo*)foo; // to suppress warning
}
C11 allows unnamed structure fields (GCC supports it some time), so in case of using GCC (or C11 compliant compiler) you can declare internal structure as:
struct foo_internal {
struct foo;
int private;
};
therefore no extra effort required to keep structure definitions in sync.
I'm wondering about the practical use of #undef in C. I'm working through K&R, and am up to the preprocessor. Most of this was material I (more or less) understood, but something on page 90 (second edition) stuck out at me:
Names may be undefined with #undef,
usually to ensure that a routine is
really a function, not a macro:
#undef getchar
int getchar(void) { ... }
Is this a common practice to defend against someone #define-ing a macro with the same name as your function? Or is this really more of a sample that wouldn't occur in reality? (EG, no one in his right, wrong nor insane mind should be rewriting getchar(), so it shouldn't come up.) With your own function names, do you feel the need to do this? Does that change if you're developing a library for others to use?
What it does
If you read Plauger's The Standard C Library (1992), you will see that the <stdio.h> header is allowed to provide getchar() and getc() as function-like macros (with special permission for getc() to evaluate its file pointer argument more than once!). However, even if it provides macros, the implementation is also obliged to provid actual functions that do the same job, primarily so that you can access a function pointer called getchar() or getc() and pass that to other functions.
That is, by doing:
#include <stdio.h>
#undef getchar
extern int some_function(int (*)(void));
int core_function(void)
{
int c = some_function(getchar);
return(c);
}
As written, the core_function() is pretty meaningless, but it illustrates the point. You can do the same thing with the isxxxx() macros in <ctype.h> too, for example.
Normally, you don't want to do that - you don't normally want to remove the macro definition. But, when you need the real function, you can get hold of it. People who provide libraries can emulate the functionality of the standard C library to good effect.
Seldom needed
Also note that one of the reasons you seldom need to use the explicit #undef is because you can invoke the function instead of the macro by writing:
int c = (getchar)();
Because the token after getchar is not an (, it is not an invocation of the function-like macro, so it must be a reference to the function. Similarly, the first example above, would compile and run correctly even without the #undef.
If you implement your own function with a macro override, you can use this to good effect, though it might be slightly confusing unless explained.
/* function.h */
…
extern int function(int c);
extern int other_function(int c, FILE *fp);
#define function(c) other_function(c, stdout);
…
/* function.c */
…
/* Provide function despite macro override */
int (function)(int c)
{
return function(c, stdout);
}
The function definition line doesn't invoke the macro because the token after function is not (. The return line does invoke the macro.
Macros are often used to generate bulk of code. It's often a pretty localized usage and it's safe to #undef any helper macros at the end of the particular header in order to avoid name clashes so only the actual generated code gets imported elsewhere and the macros used to generate the code don't.
/Edit: As an example, I've used this to generate structs for me. The following is an excerpt from an actual project:
#define MYLIB_MAKE_PC_PROVIDER(name) \
struct PcApi##name { \
many members …
};
MYLIB_MAKE_PC_PROVIDER(SA)
MYLIB_MAKE_PC_PROVIDER(SSA)
MYLIB_MAKE_PC_PROVIDER(AF)
#undef MYLIB_MAKE_PC_PROVIDER
Because preprocessor #defines are all in one global namespace, it's easy for namespace conflicts to result, especially when using third-party libraries. For example, if you wanted to create a function named OpenFile, it might not compile correctly, because the header file <windows.h> defines the token OpenFile to map to either OpenFileA or OpenFileW (depending on if UNICODE is defined or not). The correct solution is to #undef OpenFile before defining your function.
Although I think Jonathan Leffler gave you the right answer. Here is a very rare case, where I use an #undef. Normally a macro should be reusable inside many functions; that's why you define it at the top of a file or in a header file. But sometimes you have some repetitive code inside a function that can be shortened with a macro.
int foo(int x, int y)
{
#define OUT_OF_RANGE(v, vlower, vupper) \
if (v < vlower) {v = vlower; goto EXIT;} \
else if (v > vupper) {v = vupper; goto EXIT;}
/* do some calcs */
x += (x + y)/2;
OUT_OF_RANGE(x, 0, 100);
y += (x - y)/2;
OUT_OF_RANGE(y, -10, 50);
/* do some more calcs and range checks*/
...
EXIT:
/* undefine OUT_OF_RANGE, because we don't need it anymore */
#undef OUT_OF_RANGE
...
return x;
}
To show the reader that this macro is only useful inside of the function, it is undefined at the end. I don't want to encourage anyone to use such hackish macros. But if you have to, #undef them at the end.
I only use it when a macro in an #included file is interfering with one of my functions (e.g., it has the same name). Then I #undef the macro so I can use my own function.
Is this a common practice to defend against someone #define-ing a macro with the same name as your function? Or is this really more of a sample that wouldn't occur in reality? (EG, no one in his right, wrong nor insane mind should be rewriting getchar(), so it shouldn't come up.)
A little of both. Good code will not require use of #undef, but there's lots of bad code out there you have to work with. #undef can prove invaluable when somebody pulls a trick like #define bool int.
In addition to fixing problems with macros polluting the global namespace, another use of #undef is the situation where a macro might be required to have a different behavior in different places. This is not a realy common scenario, but a couple that come to mind are:
the assert macro can have it's definition changed in the middle of a compilation unit for the case where you might want to perform debugging on some portion of your code but not others. In addition to assert itself needing to be #undef'ed to do this, the NDEBUG macro needs to be redefined to reconfigure the desired behavior of assert
I've seen a technique used to ensure that globals are defined exactly once by using a macro to declare the variables as extern, but the macro would be redefined to nothing for the single case where the header/declarations are used to define the variables.
Something like (I'm not saying this is necessarily a good technique, just one I've seen in the wild):
/* globals.h */
/* ------------------------------------------------------ */
#undef GLOBAL
#ifdef DEFINE_GLOBALS
#define GLOBAL
#else
#define GLOBAL extern
#endif
GLOBAL int g_x;
GLOBAL char* g_name;
/* ------------------------------------------------------ */
/* globals.c */
/* ------------------------------------------------------ */
#include "some_master_header_that_happens_to_include_globals.h"
/* define the globals here (and only here) using globals.h */
#define DEFINE_GLOBALS
#include "globals.h"
/* ------------------------------------------------------ */
If a macro can be def'ed, there must be a facility to undef.
a memory tracker I use defines its own new/delete macros to track file/line information. this macro breaks the SC++L.
#pragma push_macro( "new" )
#undef new
#include <vector>
#pragma pop_macro( "new" )
Regarding your more specific question: namespaces are often emul;ated in C by prefixing library functions with an identifier.
Blindly undefing macros is going to add confusion, reduce maintainability, and may break things that rely on the original behavior. If you were forced, at least use push/pop to preserve the original behavior everywhere else.