Forward declare entities in C standard library? - c

Is it legal to forward declare structs and functions provided by the C standard library?
My background is C++ in which the answer is no. The primary reason for this is that a struct or class mandated by the C++ standard library can be a template behind the scenes and may have "secret" template parameters and so cannot be properly declared with a naive non-template declaration. Even if a user does figure out exactly how to forward declare a particular entity in a particular version of a particular implementation, the implementation is not obliged to not break that declaration in future versions.
I don't have a copy of any C standard at hand but obviously there are no templates in C.
So is it legal to forward declare entities in the C standard library?
Another reason that entities in the C++ standard library may not be forward declared is that headers provided by the implementation need not follow the normal rules. For example, in a recent question I asked if a C++ header provided by the implementation need be an actual file and the answer was no. I don't know if any of that applies to C.
The C standard library is used by both C and C++ but for this question I'm only asking about C.

Forward declarations of structs are always permissible in C. However, not very many types can be used this way. For example, you can't use a forward declaration for FILE simply because the tag name of the struct is not specified (and theoretically, it may not be a struct at all).
Section 7.1.4 paragraph 2 of n1570 gives you permission to do the same with functions:
Provided that a library function can be declared without reference to any type defined in a
header, it is also permissible to declare the function and use it without including its
associated header.
This used to be rather common. I think the reasoning here is that hard drives are slow, and fewer #include means faster compile times. But this isn't the 1980s any more, and we all have fast CPUs and fast hard drives, so a few #include aren't even noticed.
void *malloc(size_t);
void abort(void);
/* my code here */

yes you can this is perfectly valid.
this can be done with the standard library too.
double atof(const char *);
int main() {
double t = atof("13.37");
return 0;
}
#include <stdio.h>
Similiar things can be done with structs, variables etc.
I would recommend you read the wiki page which features some c examples:
http://en.wikipedia.org/wiki/Forward_declaration
this is specified in the c standard, Section 7.1.4 paragraph 2 of n1570
Provided that a library function can be declared without reference to any type defined in a header, it is also permissible to declare the function and use it without including its associated header.

Related

Providing helper functions when rolling out own structures

if I am developing a C shared library and I have my own structs. To make common operations on these struct instances easier for library consumers, can I provide function pointers to such functions inside the struct itself? Is it a good practice? Would there be issues with respect to multithreading where a utility function is called in parallel with different arguments and so on?
I know it goes a lot closer to C++ classes but I wish to stick to C and learn how it would be done in a procedural language as opposed to OOP.
To give an example
typedef struct tag tag;
typedef struct my_custom_struct my_custom_struct;
struct tag
{
// ...
};
struct my_custom_struct
{
tag *tags;
my_custom_struct* (*add_tag)(my_custom_struct* str, tag *tag);
};
my_custom_struct* add_tag(my_custom_struct* str, tag *tag)
{
// ...
}
where add_tag is a helper that manages to add the tag to tag list inside *str.
I saw this pattern in libjson-c like here- http://json-c.github.io/json-c/json-c-0.13.1/doc/html/structarray__list.html. There is a function pointer given inside array_list to help free it.
To make common operations on these struct instances easier for library
consumers, can I provide function pointers to such functions inside
the struct itself?
It is possible to endow your structures with members that are function pointers, pointing to function types whose parameters include pointers to your structure type, and that are intended to be used more or less like C++ instance methods, more or less as presented in the question.
Is it a good practice?
TL;DR: no.
The first problem you will run into is getting those pointer members initialized appropriately. Name correspondence notwithstanding, the function pointers in instances of your structure will not automatically be initialized to point to a particular function. Unless you make the structure type opaque, users can (and undoubtedly sometimes will) declare instances without calling whatever constructor-analog function you provide for the purpose, and then chaos will ensue.
If you do make the structure opaque (which after all isn't a bad idea), then you'll need non-member functions anyway, because your users won't be able to access the function pointers directly. Perhaps something like this:
struct my_custom_struct *my_add_tag(struct my_custom_struct *str, tag *tag) {
return str->add_tag(str, tag);
}
But if you're going to provide for that, then what's the point of the extra level of indirection? (Answer: the only good reason for that would be that in different instances, the function pointer can point to different functions.)
And similar applies if you don't make the structure opaque. Then you might suppose that users would (more) directly call
str->add_tag(str, tag);
but what exactly makes that a convenience with respect to simply
add_tag(str, tag);
?
So overall, no, I would not consider this approach a good practice in general. There are limited circumstances where it may make sense to do something along these lines, but not as a general library convention.
Would there be issues with
respect to multithreading where a utility function is called in
parallel with different arguments and so on?
Not more so than with functions designated any other way, except if the function pointers themselves are being modified.
I know it goes a lot closer to C++ classes but I wish to stick to C
and learn how it would be done in a procedural language as opposed to
OOP.
If you want to learn C idioms and conventions then by all means do so. What you are describing is not one. C code and libraries can absolutely be designed with use of OO principles such as encapsulation, and to some extent even polymorphism, but it is not conventionally achieved via the mechanism you describe. This answer touches on some of the approaches that are used for the purpose.
Is it a good practice?
TLDR; no.
Background:
I've been programming almost exclusively in embedded C on STM32 microcontrollers for the last year and a half (as opposed to using C++ or "C+", as I'll describe below). It's been very insightful for me to have to learn C at the architectural level, like I have. I've studied C architecture pretty hard to get to where I can say I "know C". It turns out, as we all know, C and C++ are NOT the same language. At the syntax level, C is almost exactly a subset of C++ (with some key differences where C supports stuff C++ does not), hence why people (myself included before this) frequently think/thought they are pretty much the same language, but at the architectural level they are VASTLY DIFFERENT ANIMALS.
Aside:
Note that my favorite approach to embedded is to use what some colloquially know as "C+". It is basically using a C++ compiler to write C-style embedded code. You basically just write C how you'd expect to write C, except you use C++ classes to vastly simplify the (otherwise pure C) architecture. In other words, "C+" is a pseudonym used to describe using a C++ compiler to write C-like code that uses classes instead of "object-based C" architecture (which is described below). You may also use some advanced C++ concepts on occasion, like operator overloading or templates, but avoid the STL for the most part to not accidentally use dynamic allocation (behind-the-scenes and automatically, like C++ vectors do, for example) after initialization, since dynamic memory allocation/deallocation in normal run-time can quickly use up scarce RAM resources and make otherwise-deterministic code non-deterministic. So-called "C+" may also include using a mix of C (compiled with the C compiler) and C++ (compiled with the C++ compiler), linked together as required (don't forget your extern "C" usage in C header files included in your C++ code, as required).
The core Arduino source code (again, the core, not necessarily their example "sketches" or example code for beginners) does this really well, and can be used as a model of good "C+" design. <== before you attack me on this, go study the Arduino source code for dozen of hours like I have [again, NOT the example "sketches", but their actual source code, linked-to below], and drop your "arduino is for beginners" pride right now.
The AVR core (mix of C and "C+"-style C++) is here: https://github.com/arduino/ArduinoCore-avr/tree/master/cores/arduino
Some of the core libraries ("C+"-style C++) are here: https://github.com/arduino/ArduinoCore-avr/tree/master/libraries
[aside over]
Architectural C notes:
So, regarding C architecture (ie: actual C, NOT "C+"/C-style C++):
C is not an OO language, as you know, but it can be written in an "object-based" style. Notice I say "object-based", NOT "object oriented", as that's how I've heard other pedantic C programmers refer to it. I can say I write object-based C architecture, and it's actually quite interesting.
To make object-based C architecture, here's a few things to remember:
Namespaces can be done in C simply by prepending your namespace name and an underscore in front of something. That's all a namespace really is after-all. Ex: mylibraryname_foo(), mylibraryname_bar(), etc. Apply this to enums, for example, since C doesn't have "enum classes" like C++. Apply it to all C class "methods" too since C doesn't have classes. Apply to all global variables or defines as well that pertain to a particular library.
When making C "classes", you have 2 major architectural options, both of which are very valid and widely used:
Use public structs (possibly hidden in headers named "myheader_private.h" to give them a pseudo-sense of privacy)
Use opaque structs (frequently called "opaque pointers" since they are pointers to opaque structs)
When making C "classes", you have the option of wrapping up pointers to functions inside of your structs above to give it a more "C++" type feel. This is somewhat common, but in my opinion a horrible idea which makes the code nearly impossible to follow and very difficult to read, understand, and maintain.
1st option, public structs:
Make a header file with a struct definition which contains all your "class data". I recommend you do NOT include pointers to functions (will discuss later). This essentially gives you the equivalent of a "C++ class where all members are public." The downside is you don't get data hiding. The upside is you can use static memory allocation of all of your C "class objects" since your user code which includes these library headers knows the full specification and size of the struct.
2nd option: opaque structs:
In your library header file, make a forward declaration to a struct:
/// Opaque pointer (handle) to C-style "object" of "class" type mylibrarymodule:
typedef struct mylibrarymodule_s *mylibrarymodule_h;
In your library .c source file, provide the full definition of the struct mylibrarymodule_s. Since users of this library include only the header file, they do NOT get to see the full implementation or size of this opaque struct. That is what "opaque" means: "hidden". It is obfuscated, or hidden away. This essentially gives you the equivalent of a "C++ class where all members are private." The upside is you get true data hiding. The downside is you can NOT use static memory allocation for any of your C "class objects" in your user code using this library, since any user code including this library doesn't even know how big the struct is, so it cannot be statically allocated. Instead, the library must do dynamic memory allocation at program initialization, one time, which is safe even for embedded deterministic real-time safety-critical systems since you are not allocating or freeing memory during normal program execution.
For a detailed and full example of Option 2 (don't be confused: I call it "Option 1.5" in my answer linked-to here) see my other answer on opaque structs/pointers here: Opaque C structs: how should they be declared?.
Personally, I think the Option 1, with static memory allocation and "all public members", may be my preferred approach, but I am most familiar with the opaque struct Option 2 approach, since that's what the C code base I work in the most uses.
Bullet 3 above: including pointers to functions in your structs.
This can be done, and some do it, but I really hate it. Don't do it. It just makes your code so stinking hard to follow. In Eclipse, for instance, which has an excellent indexer, I can Ctrl + click on anything and it will jump to its definition. What if I want to see the implementation of a function I'm calling on a C "object"? I Ctrl + click it and it jumps to the declaration of the pointer to the function. But where's the function??? I don't know! It might take me 10 minutes of grepping and using find or search tools, digging all around the code base, to find the stinking function definition. Once I find it, I forget where I was, and I have to repeat it all over again for every single function, every single time I edit a library module using this approach. It's just bad. The opaque pointer approach above works fantastic instead, and the public pointer approach would be easy too.
Now, to directly answer your questions:
To make common operations on these struct instances easier for library consumers, can I provide function pointers to such functions inside the struct itself?
Yes you can, but it only makes calling something easier. Don't do it. Finding the function to look at its implementation becomes really hard.
Is it a good practice?
No, use Option 1 or Option 2 above instead, where you now just have to call C "namespaced" "methods" on every C "object". You must simply pass the "members of the C class" into the function as the first argument for every call instead. This means instead of in C++ where you can do:
myclass.dosomething(int a, int b);
You'll just have to do in object-based C:
// Notice that you must pass the "guts", or member data
// (`mylibrarymodule` here), of each C "class" into the namespaced
// "methods" to operate on said C "class object"!
// - Essentially you're passing around the guts (member variables)
// of the C "class" (which guts are frequently referred to as
// "private data", or just `priv` in C lingo) to each function that
// needs to operate on a C object
mylibrarymodule_dosomething(mylibrarymodule_h mylibrarymodule, int a, int b);
Would there be issues with respect to multithreading where a utility function is called in parallel with different arguments and so on?
Yes, same as in any multithreaded situation where multiple threads are trying to access the same data. Just add a mutex to each C struct-based "object", and be sure each "method" acting on your C "objects" properly locks (takes) and unlocks (gives) the mutex as required before operating on any shared volatile members of the C "object".
Related:
Opaque C structs: how should they be declared? [use "Object-based" C architecture]
I would like to suggest you reading com specification, you will gain a lot. all these com, ole and dcom technology is based on a simple struct that incorporates its own data and methods.
https://www.scribd.com/document/45643943/Com-Spec
simplied more here
http://www.voidcn.com/article/p-fixbymia-beu.html

Why nested functions in C are against C standards

Nested functions(function declarations in block scope) are not allowed in C standards(ANSI[C89], C99, C11).
But I couldn't find stating it in C standards.
Edit :
Why a function definition cannot be in a function definition(compound statement)?
There is a difference between a function declaration and a function definition. A declaration merely declares the existence of a function and a definition defines the function.
int f(void) { /* ... */ } // function definition
int f(void); // function declaration
In 6.9.1 the syntax for a function is defined as
function-definition:
declaration-specifiers declarator declaration-listopt compound-statment
In 6.8.2, the things you can put in a compound statement are defined as a declaration or a statement. A function definition isn't considered to be either of these syntactically.
So yes, a function declaration is legal in a function but a function definition is not e.g.
int main(int argc, char*argv[])
{
int f(void); // legal
int g(void) { return 1; } ; // ILLEGAL
// blah blah
}
It may not be stated directly but if you work through the grammar for function-definition you'll find they're not accepted within the grammar.
Why? According to Dennis Richie (who is a bit of an authority on the matter) it appears they were excluded from the start:
"Procedures can be nested in BCPL, but may not refer to non-static objects defined in containing procedures. B and C avoid this restriction by imposing a more severe one: no nested procedures at all."
https://www.bell-labs.com/usr/dmr/www/chist.html
It's humor to avoid a restriction by imposing a more severe one. I read this as it being a simplifying maneuver. Nested procedures add complexity to the compiler (which Ritchie was very keen to limit on machines of the time) and add little value.
The standardization process was (wisely) never seen as an opportunity to extend C willy-nilly and (from the same document):
"From the beginning, the X3J11 committee took a cautious, conservative view of language extensions."
It's difficult to put a case that nested functions offer significant benefits so it's not surprising that even if some implementations were supporting them they weren't adopted as standard.
In general the standards efforts ever since have been at least equally conservative and again it's difficult to see a lot of support amongst implementers to add such a feature.
At the end of the day if you're worried that some function be used outside its intended purpose and is (logically) a sub-function of exactly one given function then give it static linkage and introduce another source file or even whole translation unit.
Nested or private functions are something that many C compilers used to allow, but are not part of the C standard, and now it's quite rare to find compilers that support them, certainly by default.
The standard is determined by a committee, and nested functions will be something that they have discussed, and there will be a rationale, but I don't know what it is offhand, nor do most C programmers. Nested functions aren't inherently a bad idea, but you can achieve virtually all of the benefits by writing a static file scope function, which is the method of creating private functions which was standardised.

C compiler - list __builtin_ types

here is a small (and working) C code:
typedef __builtin_va_list __va_list;
int main() {
return 0;
}
I found an answer, how gcc found the base type:
Pycparser not working on preprocessed code
But how can I list all of the __builtin_ "base" types, which not defined explicitly?
Thanks,
a.
how can I list all of the __builtin_ "base" types, which not defined explicitly?
TL;DR: there is no general-purpose way to do it.
The standard does not define any such types, so they can only be implementation-specific. In particular, C does not attribute any significance to the __builtin_ name prefix (though such identifiers are reserved), nor does it acknowledge that any types exist that are not derived from those it does define. Thus, for most purposes, the types you are asking about should be considered an implementation detail.
If there were a way to list implementation-specific built-in types, it would necessarily be implementation-specific itself. For example, you might be able to find a list such as you are after in the compiler's documentation. You could surely derive one from the compiler's own source code, if that's available to you. You could maybe extract strings from the compiler binary, and filter for a characteristic name pattern, such as strings starting with "__builtin_".
You could also consider parsing all the standard library headers (with the assumption that they are correct) to find undeclared types, though that's not guaranteed to find all the available types. Moreover, with some systems, for example GNU's, the C standard library (to which the headers belong) is separate from the compiler.

Writing prototypes instead of #include <stdio.h>

For example, here is a "hello, world" program without stdio.h included:
int puts(const char *str);
int main(void)
{
puts("hello, world");
}
I even think this can be a good programming style when my program gets longer and longer, for all functions called are listed at the beginning explicitly.
So my question is: Besides providing prototypes for standard library functions, what else does #include <stdio.h> do?
The (non-normative) Appendix J.2 of the C11 standard draft lists the following among examples of undefined behaviour:
— A function, object, type, or macro that is specified as being declared or defined by some standard header is used before any header that declares or defines it is included (7.1.2)
However, as pointed out by Keith Thompson, the 7.1.4p2 says:
2 Provided that a library function can be declared without reference to any type defined in a header, it is also permissible to declare the function and use it without including its associated header.
Thus using puts without including <stdio.h> can indeed be done in standard-conforming manner. However, you cannot declare fputs, since it requires a pointer-to-FILE as an argument, which you cannot do in a strictly conforming manner.
In addition, puts might also be a macro in presence of <stdio.h> and expand to something faster in the presence of the header.
All in all, the number of functions that can be declared properly without including headers is not that large. As for the functions that use some types from the headers - if you're asking with language-lawyer tag about C, the answer comes from the standard and the standard is rather outspoken about this: don't do it, or your program will not be strictly conforming, period.
<stdio.h> defines the type FILE, among other things. You can't portably call any function that takes a FILE* parameter or returns a FILE* result without #include <stdio.h>.
And there's really no good reason to declare any of the functions yourself rather than including the header.
When using proper program design, all prototypes of public functions are placed in header files and all function definitions are placed in c files. This is how you write C programs, period.
It is the industry de facto standard way of C programming and no professionals use any other design.
Your personal preference is not relevant here, nor are any loop-holes in the C standard that would allow you to make a different design. You should write your C programs in the same way as the rest of the world does.

Ansi C - programming language book of K&R - header file inclusion

Going through the K&R ansi C programming language book (second version), on page 82 an example is given for a programming files/folders layout.
What I don't understand is, while calc.h gets included in main (use of functions), getop.c (definition of getop) and stack.c (definition of push and pop), it does not get included into getch.c, even though getch and ungetch are defined there.
Although it's a good idea to include the header file it's not required as getch.c doesn't actually use the function declared in calc.h, it could even get by if it only used those already defined in getch.c.
The reason it's a good idea to include the header file anyway is because it would provide some safety if you use modern style prototypes and definitions. The compiler should namely complain if for example getop isn't defined in getop.c with the same signature as in calc.h.
calc.h contains the declaration of getch() and ungetch(). It is included by files that want to use these functions (and, therefore, need their signature).
getch.c, instead, contains the definition of getch() and ungetch(). Therefore, there is no need of including their declaration (which is implicitly defined in the definition).
The omission you have so aptly discovered can be a source of a real problem. In order to benefit fully from C's static type checking across a multi-translation-unit program (which is almost anything nontrivial), we must ensure that the site which defines an external name (such as a function) as well as all the sites which refer to the name, have the same declaration in scope, ideally from a single source: one header file where that name is declared.
If the definition doesn't have the declaration in scope, then it is possible to change the definition so that it no longer matches the declaration. The program will still translate and link, resulting in undefined behavior when the function is called or the object is used.
If you use the GNU compiler, you can guard against this problem using -Wmissing-prototypes. Straight from the gcc manual page:
-Wmissing-prototypes (C and Objective-C only)
Warn if a global function is defined without a previous prototype
declaration. This warning is issued even if the definition itself
provides a prototype. The aim is to detect global functions that
fail to be declared in header files.
Without diagnosis, this kind of thing, such as forgetting a header file, can happen to the best of us.
One possible reason why the header was forgotten is that the example project uses the "one big common header" convention. The "one big common header" approach lets the programmer forget all about headers. Everything just sees everything else and the #include "calc.h" which makes it work is just a tiny footnote that can get swallowed up in the amnesia. :)
The other aspect is that the authors had spent a lot of time programming in pre-ANSI "Classic" C without prototype declarations. In Classic C, header files are mainly for common type declarations and macros. The habit is that if a source file doesn't need some type or macros that are defined in some header, then it doesn't need to include that header. A resurgence of that habit could be what is going on here.

Resources