unix 6th source code. Very old.
conf.h
struct bdevsw {
int (*d_open)();
int (*d_close)();
int (*d_strategy)();
int *d_tab;
} bdevsw[];
conf.c
int (*bdevsw[])(){
&nulldev, &nulldev, &rkstrategy, &rktab,
&nodev, &nodev, &nodev, 0,
0
}
My question why not the initialization just reads,
bdevsw[] = {......}
All the information gathered for this answer is from my trusted copy of the Lions' Commentary book. It's a good resource for the very early UNIX code.
If you pump that monstrosity of a declaration into cdecl, you'll see that its purpose is to:
declare bdevsw as array of pointer to function returning int.
Hence it's not code at all(a), rather it's the array definition for the functions, one per device, pretty much the same as your suggestion would be.
The reason it's not in the header file is probably for the following reasons.
First, the conf.c file is actually auto-generated by mkconf, as that's the file that contains the device details (block and character devices, held in bdevsw and cdevsw) for a given UNIX system.
As an autogenerated file, it makes sense to break apart the declarations of the data types (which are consistent across different systems) and the definitions of the arrays (which do change per system). The comment at the top of this file state that it, and low.s, are the result of mkconf.
Second, there are quite a few C files that include conf.h. For example, bio.c (block I/O), sys3.c (filesystem calls), fio.c (file calls), and alloc.c (very early initialisation to read the root super block).
So, if the array was defined in the header file (presumably as static to prevent double definition), each source file would basically have it's own copy, wasting precious space. By defining it in conf.c, there's one copy shared amongst everyone.
(a) Your comment that:
the second parenthesis in the initialization stmt makes it look like a function
is understandable and, in this case, it does represent a function call. But only insofar that it's an array of pointers to functions, not an actual function definition.
Related
In my C89 code, I have several units implementing a variety of abstract buffers which are to be treated by the user as if they were classes. That is, there is a public header defining the interfacing functions, and this is all the user ever sees. They are not intended to (need to) know what is going on behind the scenes.
However, at buffer creation, a raw byte-buffer is passed to the creation function, so the user must be able to know how much raw buffer space to allocate at compile time. This requires knowing how much space one item takes up in each abstract type. We are coding for a very limited embedded environment.
Currently, each buffer type has a private header in which a struct defines the format of the data. It is simple to add a macro for the size of the data element:
#define MY_ELEMENT_SIZE (sizeof(component_1_type) + sizeof(component_2_type))
However, component_x_type is intended to be hidden from the user, so this definition cannot go in the public header with the prototypes for the interfacing functions.
Our next idea was to have a const variable in the source:
const int MY_ELEMENT_SIZE = sizeof(component_1_type) + sizeof(component_2_type);
and an extern declaration in the public header:
extern const int MY_ELEMENT_SIZE;
But, because this is C89 and we have pedantry and MISRA and other requirements to fulfill, we cannot use variable-length arrays. In a "user" source file, to get a 50-element raw buffer, we write:
char rawBuffer[50 * MY_ELEMENT_SIZE] = {0u};
Using the extern const... method, this results in the compilation error:
error: variably modified ‘rawBuffer’ at file scope
This was not totally unexpected, but is disappointing in that sizeof(any_type) is genuinely constant and known at compile time.
Please advise me on how to expose the size of the data element in the public header without making the existence of component_x_type known to the user, in such a way that it can be used as an array length in C89.
Many, many thanks.
In my C89 code
It is 2020 now. Discuss with your manager or client the opportunity to use a less obsolete C standard. In practice, most hand-written C89 code can be reasonably ported to C11, and you could use, buy or develop code refactoring tools -or services- helping you with that (e.g. your GCC plugin). Remind to your manager or client that technical debt has a lot of cost (probably dozen of thousands of US$ or €). Notice that old C89 compilers are in practice optimizing much less than recent ones, and that most junior developers (your future colleagues) are not even familiar with C89 (so they would need some kind of training, which costs a lot).
How can I hide the contents of a user-exposed C preprocessor definition in non-user code?
As far as I know, you cannot (in theory). Check by reading the C11 standard n1570. Read also the documentation of GNU cpp then of GCC (or of your C compiler).
we have pedantry and MISRA and other requirements to fulfill
Be aware that these requirements have costs. Remind these costs to your client or manager.
(about hiding the content of a user-exposed C preprocessor #define)
However, in practice, a C code (e.g. inside some internal header file #include-d in your translation unit) can be generated, and this is common practice (look into GNU bison or SWIG for a well known example of C code generator, and also consider using GNU m4 or gpp or your own Guile or Python script, or your own C or C++ program emitting C code). You simply have to configure your build automation infrastructure (e.g. write your Makefile) for such a case.
If you have some script or utility generating things like #define MACRO_7oa7eIzzcxv03Tm (where MACRO_7oa7eIzzcxv03Tm is some pseudo-random or name mangled identifier) then the probability of an accidental collision with client code is quite small. A human programmer is very unlikely to think of such identifiers, and with enough care a C generating script usually won't emit identifiers colliding with that. See also this answer.
Perhaps your client or manager allows you to use (on your desktop) some generator of such "random-looking" identifier. AFAIK, they are MISRA compatible (but my MISRA standard is at office, and I am -may 2020- currently Covid19 confined at home, near Paris, France).
we cannot use variable-length arrays.
You could (with approval from manager and client) consider using struct-s with flexible array members or else use arrays of dimension 0 or 1 as the last member of your struct-s. IIRC, that was common practice in SunOS3.2
Consider also using tools like Frama-C, Clang static analyzer, or -at end of 2020- my Bismon coupled with a recent GCC. Think of subcontracting the code review of your source code.
Additional to the other answers, this is a quite primitive proposal. But it is easy to understand.
Since presumably you will not publish your header files too often to you clients, and so will not change the sizes of the types, you can use a (manually or automatically) calculated definition:
#define OUR_LIB_TYPE_X_SIZE 23
In your private sources you can then check the correctness of this assumption for example by
typedef char assert_type_x_has_size[2 * (sizeof (TypeX) == OUR_LIB_TYPE_X_SIZE) - 1];
It will error on any decent compiler on unequal sizes, because the array's size will be -1 and illegal. On equal sizes, the array's size is 1 and legal.
Because you're just defining a type, no code or memory is allocated. You might need to mark this as "unused" for some compilers or code checkers.
I've encountered this very problem too - unfortunately private encapsulation also makes the object size encapsulated. Sometimes it is sufficient to simply return the object size through a getter function, but not always.
I solved it exactly as KamilCuk showed in comments: give the caller a raw "magic number" through a #define in the .h file, then a static assert inside the .c implementation checking that the define is consistent with the object size.
If that's not elegant enough, then perhaps you could consider outsourcing the size allocation to a run-time API from the "class":
uint8_t* component1_get_raw_buffer (size_t n);
Where you return a pointer to a statically allocated buffer inside the encapsulated "class". The caller code would then have to be changed to:
uint8_t* raw_buffer;
raw_buffer = component1_get_raw_buffer(50);
This involves some internal trickery keeping track of how much memory that's allocated (and error handling - maybe return NULL on failure). You will to reserve a fixed maximum size for the internal static buffer, to cover the worst use-case scenario.
(Optionally: const qualify the returned pointer if the user isn't supposed to modify the data)
Advantages are: better OO design, no heap allocation, remain MISRA-C compliant. Disadvantages are function call overhead during initialization and the need to set aside "enough" memory in advance.
Also, this method isn't very safe in a multi-threading environment, but that's not usually an issue in embedded systems.
I'm a beginner to C, but I've had a bit of experience with some other programing languages like Ruby and Python. I would very much like to create some of my own functions in C that I could use in any of my programs that just make life easier, however I'm a little bit confused about how to do this.
From what I understand the first part of this process is to create a header file that contains all of your prototypes, and I understand that, however from what I understand it is frowned upon to include anything other than declarations in your header files, so would you also need to create a .c file that contained the actual code and then #include that in all your programs along with the header file? But if so, why would you need a header file in the first place, since defining a function also declares it?
Finally, what should you put in the main() function of your header file? Do you just leave it blank, or do you not include it?
Thanks!
The declaration of a function lets the compiler know that at link time such a function will be available. The definition of the function provides that implementation, and additionally it also serves as the declaration. There is no harm in having multiple declarations, but only one implementation can be provided. Also, at least one declaration (or the only implementation) must come before any use of the function - this alone makes forward declarations necessary in cases where two functions call one another (both cannot be before the other).
So, if you have the implementation:
int foo(int a, int b) {
return a * b;
}
The corresponding declaration is simply:
int foo(int a, int b);
(The argument names do not matter in the declaration, i.e., they can be omitted or different than in the implementation. In fact you could declare only int foo(); and it would work for the above function, but this is mainly a legacy thing and not recommended. Note that to declare a function that takes no arguments, put void in the argument list, e.g., int bar(void);)
There are a number of reasons why you would want to have separate headers with only the declaration:
The implementation may be in a separate file, which allows for organisation of code into manageable pieces, and may be compiled by itself and need not be recompiled unless that file has changed - in large projects where the total compilation time can be an hour it would be absurd to re-compile everything for a small change.
The implementation source may not be available, e.g., in case of a closed-source proprietary library.
The implementation may be in a different language with a compatible calling convention.
For practical details on how to write code in multiple files and how to use libraries, please consult a book or tutorial on C programming. As for main, you need not declare it in a header unless you are specifically calling main from another function - the convention of C programs is to call main as int main(int, char**) at start of the execution.
When compiling, each .c-file (or .cpp-file) will be compiled to an own binary first.
If one binary file is using functions from another,
it just knows "there is something outside named xyz" at that time.
Then the linker will put them together in one file and rewrite the parts of each file
which are using functions of other files,
so that they actually know where to find the used functions.
What will happen if you put code in a .h file:
At compilation time, each included h-file in a c-file will be integrated in the c-file.
If you have code for xyz in a h-file and you´re including it in more thana one c-file,
each of this compiled c-files will have a xyz. Then, the linker will be confused...
So, function code have to be in a own c file.
Why use a h-file at all?
Because, if you call xyz in your code, how should the compiler know
if this is a function of another c-file (and which parameters...)
or an error because xyz does not exist?
The reason for header files in c are for when you need the same code in multiple scripts. So if you are just repeated the same code in one script then yes it would be easier to just use a function. Also for header files, yes you would need to include a .c file for all the computations.
I am studying on "reading code" by reading pieces of NetBSD source code.
(for whoever is interested, it's < Code Reading: The Open Source Perspective > I'm reading)
And I found this function:
/* convert IP address to a string, but not into a single buffer
*/
char *
naddr_ntoa(naddr a)
{
#define NUM_BUFS 4
static int bufno;
static struct {
char str[16]; /* xxx.xxx.xxx.xxx\0 */
} bufs[NUM_BUFS];
char *s;
struct in_addr addr;
addr.s_addr = a;
strlcpy(bufs[bufno].str, inet_ntoa(addr), sizeof(bufs[bufno].str));
s = bufs[bufno].str;
bufno = (bufno+1) % NUM_BUFS;
return s;
#undef NUM_BUFS
}
It introduces 4 different temporary buffers to wrap inet_ntoa function since inet_ntoa is not re-entrant.
But seems to me this naddr_ntoa function is also not re-entrant:
the static bufno variable can be manipulated by other so the temporary buffers do not seem work as expected here.
So is it a potential bug?
Yes, this is a potential bug. If you want a similar function that most likely reentrant you could use e.g. inet_ntop (which incidentally handles IPv6 as well).
That code comes from src/sbin/routed/trace.c and it is not a general library routine, but just a custom hack used only in the routed program. The addrname() function in the same file makes use of the same trick, for the same reason. It's not even NetBSD code per se, but rather it comes from SGI originally, and is maintained by Vernon Schryver (see The Routed Page).
It's just a quick hack to allow use of multiple calls within the same expression, such as where the results are being used in one printf() call: E.g.:
printf("addr1->%s, addr2->%s, addr3->%s, addr4->%s\n",
naddr_ntoa(addr1), naddr_ntoa(addr2), naddr_ntoa(addr3), naddr_ntoa(addr4));
There are several examples of similar uses in the routed source files (if.c, input.c, rdisc.c).
There is no bug in this code. The routed program is not multi-threaded. Reentrancy is not being addressed at all in this hack. This trick has been done by design for a very specific purpose that has nothing to do with reentrancy. The Code Reading author(s) is wrong to associate this trick with reentrancy.
It's simply a way to hide the saving of multiple results in an array of static variables instead of having to individually copy those results from one static variable into separate storage in the calling function when multiple results are required for a single expression.
Remember that static variables have all the properties of global variables except for the limited scope of their identifier. It is of course true that unprotected use of global (or static) variables inside a function make that function non-reentrant, but that's not the only problem global variables cause. Use of a fully-reentrant function would not be appropriate in routed because it would actually make the code more complex than necessary, whereas this hack keeps the calling code clean and simple. It would though have been better for the hack to be properly documented such that future maintainers would more easily spot when NUM_BUFS has to be adjusted.
I'm following a guide to learn curses, and all of the C code within prototypes functions before main(), then defines them afterward. In my C++ learnings, I had heard about function prototyping but never done it, and as far as I know it doesn't make too much of a difference on how the code is compiled. Is it a programmer's personal choice more than anything else? If so, why was it included in C at all?
Function prototyping originally wasn't included in C. When you called a function, the compiler just took your word for it that it would exist and took the type of arguments you provided. If you got the argument order, number, or type wrong, too bad – your code would fail, possibly in mysterious ways, at runtime.
Later versions of C added function prototyping in order to address these problems. Your arguments are implicitly converted to the declared types under some circumstances or flagged as incompatible with the prototype, and the compiler could flag as an error the wrong order and number of types. This had the side effect of enabling varargs functions and the special argument handling they require.
Note that, in C (and unlike in C++), a function declared foo_t func() is not the same as a function declared as foo_t func(void). The latter is prototyped to have no arguments. The former declares a function without a prototype.
In C prototyping is needed so that your program knows that you have a function called x() when you have not gotten to defining it, that way y() knows that there is and exists a x(). C does top down compilation, so it needs to be defined before hand is the short answer.
x();
y();
main(){
}
y(){
x();
}
x(){
...
more code ...
maybe even y();
}
I was under the impression that it was so customers could have access to the .h file for libraries and see what functions were available to them, without having to see the implementation (which would be in another file).
Useful to see what the function returns/what parameters.
Function prototyping is a remnant from the olden days of compiler writing. It used to be considered horribly inefficient for a compiler to have to make multiple passes over a source file to compile it.
In C, in certain contexts, referring to a function in one manner is syntactically equivalent to referring to a variable: consider taking a pointer to a function versus taking a pointer to a variable. In the compiler's intermediate representation, the two are semantically distinct, but syntactically, whether an identifier is a variable, a function name, or an invalid identifier cannot be determined from the context.
Since it's not determinable from the context, without function prototypes, the compiler would need to make an extra pass over each one of your source files each time one of them compiles. This would add an extra O(n) factor for any compilation (that is, if compilation were O(m), it would now be O(m*n)), where n is the number of files in your project. In large projects, where compilation is already on the order of hours, having a two-pass compiler is highly undesirable.
Forward declaring all your functions would allow the compiler to build a table of functions as it scanned the file, and be able to determine when it encountered an identifier whether it referred to a function or a variable.
As a result of this, C (and by extension, C++) compilers can be extremely efficient in compilation.
It allows you to have a situation in which say you can have an iterator class defined in a separate .h file which includes the parent container class. Since you've included the parent header in the iterator, you can't have a method like say "getIterator()" because the return type would have to be the iterator class and therefore it would require that you include the iterator header inside the parent header creating a cyclic loop of inclusions (one includes the other which includes itself which includes the other again, etc.).
If you put the iterator class prototype inside the parent container, you can have such a method without including the iterator header. It only works because you're simply saying that such an object exists and will be defined.
There are ways of getting around it like having a precompiled header, but in my opinion it's less elegant and comes with a slew of disadvantages. Of couurse this is C++, not C. However, in practice you might have a situation in which you'd like to arrange code in this fashion, classes aside.
I have a C project where all code is organized in *.c/*.h file pairs, and I need to define a constant value in one file, which will be however also be used in other files. How should I declare and define this value?
Should it be as static const ... in the *.h file? As extern const ... in the *.h file and defined in the *.c file? In what way does it matter if the value is not a primitive datatype (int, double, etc), but a char * or a struct? (Though in my case it is a double.)
Defining stuff inside *.h files doesn't seem like a good idea generally; one should declare things in the *.h file, but define them in the *.c file. However, the extern const ... approach seems inefficient, as the compiler wouldn't be able to inline the value, it instead having to be accessed via its address all the time.
I guess the essence of this question is: Should one define static const ... values in *.h files in C, in order to use them in more that one place?
The rule I follow is to only declare things in H files and define them in C files. You can declare and define in a single C file, assuming it will only be used in that file.
By declaration, I mean notify the compiler of its existence but don't allocate space for it. This includes #define, typedef, extern int x, and so on.
Definitions assign values to declarations and allocate space for them, such as int x and const int x. This includes function definitions; including these in header files frequently lead to wasted code space.
I've seen too many junior programmers get confused when they put const int x = 7; in a header file and then wonder why they get a link error for x being defined more than once. I think at a bare minimum, you would need static const int x so as to avoid this problem.
I wouldn't be too worried about the speed of the code. The main issue with computers (in terms of speed and cost) long ago shifted from execution speed to ease of development.
If you need constants (real, compile time constants) you can do that three ways, putting them into header files (there is nothing bad with that):
enum {
FOO_SIZE = 1234,
BAR_SIZE = 5678
};
#define FOO_SIZE 1234
#define BAR_SIZE 5678
static const int FOO_SIZE = 1234;
static const int BAR_SIZE = 5678;
In C++, i tend to use the enum way, since it can be scoped into a namespace. For C, i use the macro. This basicially comes down to a matter of taste though. If you need floating point constants, you can't use the enumeration anymore. In C++ i use the last way, the static const double, in that case (note in C++ static would be redundant then; they would become static automatically since they are const). In C, i would keep using the macros.
It's a myth that using the third method will slow down your program in any way. I just prefer the enumeration since the values you get are rvalues - you can't get their address, which i regard as an added safety. In addition, there is much less boiler-plate code written. The eye is concentrated on the constants.
Do you really have a need to worry about the advantage of inline? Unless you're writing embedded code, stick to readability. If it's really a magic number of something, I'd use a define; I think const is better for things like const version strings and modifying function call arguments. That said, the define in .c, declare in .h rule is definitely a fairly universally accepted convention, and I wouldn't break it just because you might save a memory lookup.
As a general rule, you do not define things as static in a header. If you do define static variables in a header, each file that uses the header gets its own private copy of whatever is declared static, which is the antithesis of DRY principle: don't repeat yourself.
So, you should use an alternative. For integer types, using enum (defined in a header) is very powerful; it works well with debuggers too (though the better debuggers may be able to help with #define macro values too). For non-integer types, an extern declaration (optionally qualified with const) in the header and a single definition in one C file is usually the best way to go.
I'd like to see more context for your question. The type of the value is critical, but you've left it out. The meaning of the const keyword in C is quite subtle; for example
const char *p;
does not mean that pointer p is a constant; you can write p all you like. What you cannot write is the memory that p points to, and this stays true even as p's value changes. This is about the only case I really understand; in general, the meaning of the subtle placement of const eludes me. But this special case is extremely useful for function parameters because it extracts a promise from the function that the memory the argument points to will not be mutated.
There is one other special case everyone should know: integers. Almost always, constant, named integers should be defined in a .h file as enumeration literals. enum types not only allow you to group related constants together in a natural way, but also allow you the names of those constants to be seen in the debugger, which is a huge advantage.
I've written tens of thousands of lines of C; probably hundreds if I try to track it down. (wc ~/src/c/*.c says 85 thousand, but some of that is generated, and of course there's a lot of C code lurking elsewhere). Aside from the two cases about, I've never found much use for const. I would be pleased to learn a new, useful example.
I can give you an indirect answer. In C++ (as opposed to C) const implies static. Thatis to say in C++ static const is the same thing as const. So that tells you how that C++ standards body feels about the issue i.e. all consts should be static.
for autoconf environment:
You can always define constants in the configure file as well. AC_DEFINE() i guess is the macro to define across the entire build.
To answer the essence of your question:
You generally do NOT want to define a static variable in a header file.
This would cause you to have duplicated variables in each translation units (C files) that include the header.
variables in a header should really be declared extern since that is the implied visibility.
See this question for a good explanation.
Actually, the situation might not be so dire, as the compiler would probably convert a const type to a literal value. But you might not want to rely on that behavior, especially if optimizations are turned off.
In C++, you should always use
const int SOME_CONST = 17;
for constants and never
#define SOME_CONST 17
Defines will almost always come back and bite you later. Consts are in the language, and are strongly typed, so you won't get weird errors because of some hidden interaction. I would put the const in the appropriate header file. As long as it's #pragma once (or #ifndef x / #define x / #endif), you won't ever get any compile errors.
In vanilla C, you might have compatibility problems where you must use #defines.