Related
I can understand that:
One of the origins of the UB is a performance increase (e.g. by removing never executed code, such as if (i+1 < i) { /* never_executed_code */ }; UPD: if i is a signed integer).
UB can be triggered at compile time because C does not clearly distinguish between compile time and run time. The "whole language is based on the (rather unhelpful) concept of an "abstract machine" (link).
However, I cannot understand yet why C preprocessor is a subject of undefined behavior? It is known that preprocessing directives are executed at compile time.
Consider C11, 6.10.3.3 The ## operator, 3:
If the result is not a valid preprocessing token, the behavior is undefined.
Why not make it a constraint? For example:
The result shall be a valid preprocessing token.
The same question goes for all the other "the behavior is undefined" in 6.10 Preprocessing directives.
Why is the C preprocessor a subject of undefined behavior?
When the C standard was created, there were some existing C preprocessors and there was some imaginary ideal C preprocessor in the minds of standardization committee members.
So there were these gray areas, where committee members weren't completely sure what would they want to do and/or existing C preprocessor implementations differed which each other in behavior.
So, these cases are not defined behavior. Because the C committee members are not completely sure what the behavior actually should be. So there is no requirement on what it should be.
One of the origins of the UB
Yes, one of.
UB may exist to ease up implementing the language. Like for example, in case of the preprocessor, the preprocessor writers don't have to care about what happens when an invalid preprocessor token is a result of ##.
Or UB may exist to reconcile existing implementations with different behaviors or as a point for extensions. So a preprocessor that segfaults in case of UB, a preprocessor that accepts and works in case of UB, and a preprocessor that formats your hard drive in case of UB, all can be standard conformant (but I wouldn't want to work on that one that formats your drive).
Suppose a file which is read in via include directive ends with the partial line:
#define foo bar
Depending upon the design of the preprocessor, it's possible that the partial token bar might be concatenated to whatever appears at the start of the line following the #include directive, or that whatever appears on that line will behave as though it were placed on the line with the #define directive, but with a whitespace separating it from the token bar, and it would hardly be inconceivable that a build script might rely upon such behaviors. It's also possible that implementations might behave as though a newline were inserted at the end of the included file, or might ignore the last partial line of such a file.
Any code which relied upon one of the former behaviors would clearly have been non-portable, but if code exploited such behavior to do something that would otherwise not be practical, such code would hardly be "erroneous", and the authors of the Standard would not have wanted to forbid an implementation that would process it usefully from continuing to do so.
When the Standard uses the phrase "non-portable or erroneous", that does not mean "non-portable, therefore erroneous". Prior to the publication of C89, C implementations defined many useful constructs, but none of them were defined by "the C Standard" since there wasn't one. If an implementation defined the behavior of some construct, some didn't, and the Standard left the construct as "Undefined", that would simply preserve the status quo where implementations that chose to define a useful behavior would do so, those that chose not to wouldn't, and programs that relied upon such behaviors would be "non-portable", working correctly on implementations that supported the behaviors, but not on those that didn't.
Without getting into specifics, my guess is, there exist several preprocessor implementations which have bugs, but the Standard doesn't want to declare them non-conforming, for compatibility reasons.
In human language: if you write a program which has X in it, preprocessor does weird stuff.
In standardese: the behavior of program with X is undefined.
If the standard says something like "The result shall be a valid preprocessing token", it might be unclear what "shall" means in this context.
The programmer shall write the program so this condition holds? If so, the wording with "undefined behavior" is clearer and more uniform (it appears in other places too)
The preprocessor shall make sure this condition holds? If so, this requires dedicated logic which checks the condition; may be impractical to implement.
So I'm completely new to programming. I currently study computer science and have just read the first 200 pages of my programming book, but there's one thing I cannot seem to see the difference between and which havn't been clearly specified in the book and that's reserved words vs. standard identifiers - how can I see from code if it's one or the other.
I know the reserved words are some that cannot be changed, while the standard indentifiers can (though not recommended according to my book). The problem is while my book says reserved words are always in pure lowercase like,
(int, void, double, return)
it kinda seems to be the very same for standard indentifier like,
(printf, scanf)
so how do I know when it is what, or do I have to learn all the reserved words from the ANSI C, which is the current language we are trying to learn, (or whatever future language I might work with) to know when it is when?
First off, you'll have to learn the rules for each language you learn as it is one of the areas that varies between languages. There's no universal rule about what's what.
Second, in C, you need to know the list of keywords; that seems to be what you're referring to as 'reserved words'. Those are important; they're immutable; they can't be abused because the compiler won't let you. You can't use int as a variable name; it is always a type.
Third, the C preprocessor can be abused to hijack anything; if you compile with #define double int in effect, you get what you deserve, but there's nothing much to stop you doing that.
Fourth, the only predefined variable name is __func__, the name of the current function.
Fifth, names such as printf() are defined by the standard library, but the standard library has to be implemented by someone using a C compiler; ask the maintainers of the GNU C library. For a discussion of many of the ideas behind the treaty between the standard and the compiler writers, and between the compiler writers and the programmers using a compiler, see the excellent book The Standard C Library by P J Plauger from 1992. Yes, it is old and the modern standard C library is somewhat bigger than the one from C90, but the background information is still valid and very helpful.
Reserved words are part of the language's syntax. C without int is not C, but something else. They are built into the language and are not and cannot be defined anywhere in terms of this particular language.
For example, if is a reserved keyword. You can't redefine it and even if you could, how would you do this in terms of the C language? You could do that in assembly, though.
The standard library functions you're talking about are ordinary functions that have been included into the standard library, nothing more. They are defined in terms of the language's syntax. Also, you can redefine these functions, although it's not advised to do so as this may lead to all sorts of bugs and unexpected behavior. Yet it's perfectly valid to write:
int puts(const char *msg) {
printf("This has been monkey-patched!\n");
return -1;
}
You'd get a warning that'd complain about the redefinition of a standard library function, but this code is valid anyway.
Now, imagine reimplementing return:
unknown_type return(unknown_type stuff) {
// what to do here???
}
I am a first year computer science student and my professor said #define is banned in the industry standards along with #if, #ifdef, #else, and a few other preprocessor directives. He used the word "banned" because of unexpected behaviour.
Is this accurate? If so why?
Are there, in fact, any standards which prohibit the use of these directives?
First I've heard of it.
No; #define and so on are widely used. Sometimes too widely used, but definitely used. There are places where the C standard mandates the use of macros — you can't avoid those easily. For example, §7.5 Errors <errno.h> says:
The macros are
EDOM
EILSEQ
ERANGE
which expand to integer constant expressions with type int, distinct positive values, and which are suitable for use in #if preprocessing directives; …
Given this, it is clear that not all industry standards prohibit the use of the C preprocessor macro directives. However, there are 'best practices' or 'coding guidelines' standards from various organizations that prescribe limits on the use of the C preprocessor, though none ban its use completely — it is an innate part of C and cannot be wholly avoided. Often, these standards are for people working in safety-critical areas.
One standard you could check the MISRA C (2012) standard; that tends to proscribe things, but even that recognizes that #define et al are sometimes needed (section 8.20, rules 20.1 through 20.14 cover the C preprocessor).
The NASA GSFC (Goddard Space Flight Center) C Coding Standards simply say:
Macros should be used only when necessary. Overuse of macros can make code harder to read and maintain because the code no longer reads or behaves like standard C.
The discussion after that introductory statement illustrates the acceptable use of function macros.
The CERT C Coding Standard has a number of guidelines about the use of the preprocessor, and implies that you should minimize the use of the preprocessor, but does not ban its use.
Stroustrup would like to make the preprocessor irrelevant in C++, but that hasn't happened yet. As Peter notes, some C++ standards, such as the JSF AV C++ Coding Standards (Joint Strike Fighter, Air Vehicle) from circa 2005, dictate minimal use of the C preprocessor. Essentially, the JSF AV C++ rules restrict it to #include and the #ifndef XYZ_H / #define XYZ_H / … / #endif dance that prevents multiple inclusions of a single header. C++ has some options that are not available in C — notably, better support for typed constants that can then be used in places where C does not allow them to be used. See also static const vs #define vs enum for a discussion of the issues there.
It is a good idea to minimize the use of the preprocessor — it is often abused at least as much as it is used (see the Boost preprocessor 'library' for illustrations of how far you can go with the C preprocessor).
Summary
The preprocessor is an integral part of C and #define and #if etc cannot be wholly avoided. The statement by the professor in the question is not generally valid: #define is banned in the industry standards along with #if, #ifdef, #else, and a few other macros is an over-statement at best, but might be supportable with explicit reference to specific industry standards (but the standards in question do not include ISO/IEC 9899:2011 — the C standard).
Note that David Hammen has provided information about one specific C coding standard — the JPL C Coding Standard — that prohibits a lot of things that many people use in C, including limiting the use of of the C preprocessor (and limiting the use of dynamic memory allocation, and prohibiting recursion — read it to see why, and decide whether those reasons are relevant to you).
No, use of macros is not banned.
In fact, use of #include guards in header files is one common technique that is often mandatory and encouraged by accepted coding guidelines. Some folks claim that #pragma once is an alternative to that, but the problem is that #pragma once - by definition, since pragmas are a hook provided by the standard for compiler-specific extensions - is non-standard, even if it is supported by a number of compilers.
That said, there are a number of industry guidelines and encouraged practices that actively discourage all usage of macros other than #include guards because of the problems macros introduce (not respecting scope, etc). In C++ development, use of macros is frowned upon even more strongly than in C development.
Discouraging use of something is not the same as banning it, since it is still possible to legitimately use it - for example, by documenting a justification.
Some coding standards may discourage or even forbid the use of #define to create function-like macros that take arguments, like
#define SQR(x) ((x)*(x))
because a) such macros are not type-safe, and b) somebody will inevitably write SQR(x++), which is bad juju.
Some standards may discourage or ban the use of #ifdefs for conditional compilation. For example, the following code uses conditional compilation to properly print out a size_t value. For C99 and later, you use the %zu conversion specifier; for C89 and earlier, you use %lu and cast the value to unsigned long:
#if __STDC_VERSION__ >= 199901L
# define SIZE_T_CAST
# define SIZE_T_FMT "%zu"
#else
# define SIZE_T_CAST (unsigned long)
# define SIZE_T_FMT "%lu"
#endif
...
printf( "sizeof foo = " SIZE_T_FMT "\n", SIZE_T_CAST sizeof foo );
Some standards may mandate that instead of doing this, you implement the module twice, once for C89 and earlier, once for C99 and later:
/* C89 version */
printf( "sizeof foo = %lu\n", (unsigned long) sizeof foo );
/* C99 version */
printf( "sizeof foo = %zu\n", sizeof foo );
and then let Make (or Ant, or whatever build tool you're using) deal with compiling and linking the correct version. For this example that would be ridiculous overkill, but I've seen code that was an untraceable rat's nest of #ifdefs that should have had that conditional code factored out into separate files.
However, I am not aware of any company or industry group that has banned the use of preprocessor statements outright.
Macros can not be "banned". The statement is nonsense. Literally.
For example, section 7.5 Errors <errno.h> of the C Standard requires the use of macros:
1 The header <errno.h> defines several macros, all relating to the reporting of error conditions.
2 The macros are
EDOM
EILSEQ
ERANGE
which expand to integer constant expressions with type int, distinct
positive values, and which are suitable for use in #if preprocessing
directives; and
errno
which expands to a modifiable lvalue that has type int and thread
local storage duration, the value of which is set to a positive error
number by several library functions. If a macro definition is
suppressed in order to access an actual object, or a program defines
an identifier with the name errno, the behavior is undefined.
So, not only are macros a required part of C, in some cases not using them results in undefined behavior.
No, #define is not banned. Misuse of #define, however, may be frowned upon.
For instance, you may use
#define DEBUG
in your code so that later on, you can designate parts of your code for conditional compilation using #ifdef DEBUG, for debug purposes only. I don't think anyone in his right mind would want to ban something like this. Macros defined using #define are also used extensively in portable programs, to enable/disable compilation of platform-specific code.
However, if you are using something like
#define PI 3.141592653589793
your teacher may rightfully point out that it is much better to declare PI as a constant with the appropriate type, e.g.,
const double PI = 3.141592653589793;
as it allows the compiler to do type checking when PI is used.
Similarly (as mentioned by John Bode above), the use of function-like macros may be disapproved of, especially in C++ where templates can be used. So instead of
#define SQ(X) ((X)*(X))
consider using
double SQ(double X) { return X * X; }
or, in C++, better yet,
template <typename T>T SQ(T X) { return X * X; }
Once again, the idea is that by using the facilities of the language instead of the preprocessor, you allow the compiler to type check and also (possibly) generate better code.
Once you have enough coding experience, you'll know exactly when it is appropriate to use #define. Until then, I think it is a good idea for your teacher to impose certain rules and coding standards, but preferably they themselves should know, and be able to explain, the reasons. A blanket ban on #define is nonsensical.
That's completely false, macros are heavily used in C. Beginners often use them badly but that's not a reason to ban them from industry. A classic bad usage is #define succesor(n) n + 1. If you expect 2 * successor(9) to give 20, then you're wrong because that expression will be translated as 2 * 9 + 1 i.e. 19 not 20. Use parenthesis to get the expected result.
No. It is not banned. And truth to be told, it is impossible to do non-trivial multi-platform code without it.
No your professor is wrong or you misheard something.
#define is a preprocessor macro, and preprocessor macros are needed for conditional compilation and some conventions, which aren't simply built in the C language. For example, in a recent C standard, namely C99, support for booleans had been added. But it's not supported "native" by the language, but by preprocessor #defines. See this reference to stdbool.h
Macros are used pretty heavily in GNU land C, and without conditional preprocessor commands there's be no way to properly handle multiple inclusions of the same source files, so that makes them seem like essential language features to me.
Maybe your class is actually on C++, which despite many people's failure to do so, should be distinguished from C as it is a different language, and I can't speak for macros there. Or maybe the professor meant he's banning them in his class. Anyhow I'm sure the SO community would be interested in hearing which standard he's talking about, since I'm pretty sure all C standards support the use of macros.
Contrary to all of the answers to date, the use of preprocessor directives is oftentimes banned in high-reliability computing. There are two exceptions to this, the use of which are mandated in such organizations. These are the #include directive, and the use of an include guard in a header file. These kinds of bans are more likely in C++ rather than in C.
Here's but one example: 16.1.1 Use the preprocessor only for implementing include guards, and including header files with include guards.
Another example, this time for C rather than C++: JPL Institutional Coding Standard for the C Programming Language . This C coding standard doesn't go quite so far as banning the use of the preprocessor completely, but it comes close. Specifically, it says
Rule 20 (preprocessor use)
Use of the C preprocessor shall be limited to file inclusion and simple macros. [Power of Ten Rule 8].
I'm neither condoning nor decrying those standards. But to say they don't exist is ludicrous.
If you want your C code to interoperate with C++ code, you will want to declare your externally visible symbols, such as function declarations, in the extern "C" namespace. This is often done using conditional compilation:
#ifdef __cplusplus
extern "C" {
#endif
/* C header file body */
#ifdef __cplusplus
}
#endif
Look at any header file and you will see something like this:
#ifndef _FILE_NAME_H
#define _FILE_NAME_H
//Exported functions, strucs, define, ect. go here
#endif /*_FILE_NAME_H */
These define are not only allowed, but critical in nature as each time the header file is referenced in files it will be included separately. This means without the define you are redefining everything in between the guard multiple times which best case fails to compile and worse case leaves you scratching your head later why your code doesn't work the way you want it to.
The compiler will also use define as seen here with gcc that let you test for things like the version of the compiler which is very useful. I'm currently working on a project that needs to compile with avr-gcc, but we have a testing environment that we also run our code though. To prevent the avr specific files and registers from keeping our test code from running we do something like this:
#ifdef __AVR__
//avr specific code here
#endif
Using this in the production code, the complementary test code can compile without using the avr-gcc and the code above is only compiled using avr-gcc.
If you had just mentioned #define, I would have thought maybe he was alluding to its use for enumerations, which are better off using enum to avoid stupid errors such as assigning the same numerical value twice.
Note that even for this situation, it is sometimes better to use #defines than enums, for instance if you rely on numerical values exchanged with other systems and the actual values must stay the same even if you add/delete constants (for compatibility).
However, adding that #if, #ifdef, etc. should not be used either is just weird. Of course, they should probably not be abused, but in real life there are dozens of reasons to use them.
What he may have meant could be that (where appropriate), you should not hardcode behaviour in the source (which would require re-compilation to get a different behaviour), but rather use some form of run-time configuration instead.
That's the only interpretation I could think of that would make sense.
I'm currently using the __COUNTER__ macro in my C library code to generate unique integer identifiers. It works nicely, but I see two issues:
It's not part of any C or C++ standard.
Independent code that also uses __COUNTER__ might get confused.
I thus wish to implement an equivalent to __COUNTER__ myself.
Alternatives that I'm aware of, but do not want to use:
__LINE__ (because multiple macros per line wouldn't get unique ids)
BOOST_PP_COUNTER (because I don't want a boost dependency)
BOOST_PP_COUNTER proves that this can be done, even though other answers claim it is impossible.
In essence, I'm looking for a header file "mycounter.h", such that
#include "mycounter.h"
__MYCOUNTER__
__MYCOUNTER__ __MYCOUNTER__
__MYCOUNTER__
will be preprocessed by gcc -E to
(...)
0
1 2
3
without using the built-in __COUNTER__.
Note: Earlier, this question was marked as a duplicate of this, which deals with using __COUNTER__ rather than avoiding it.
You can't implement __COUNTER__ directly. The preprocessor is purely functional - no state changes. A hidden counter is inherently impossible in such a system. (BOOST_PP_COUNTER does not prove what you want can be done - it relies on #include and is therefore one-per-line only - may as well use __LINE__. That said, the implementation is brilliant, you should read it anyway.)
What you can do is refactor your metaprogram so that the counter could be applied to the input data by a pure function. e.g. using good ol' Order:
#include <order/interpreter.h>
#define ORDER_PP_DEF_8map_count \
ORDER_PP_FN(8fn(8L, 8rec_mc(8L, 8nil, 0)))
#define ORDER_PP_DEF_8rec_mc \
ORDER_PP_FN(8fn(8L, 8R, 8C, \
8if(8is_nil(8L), \
8R, \
8let((8H, 8seq_head(8L)) \
(8T, 8seq_tail(8L)) \
(8D, 8plus(8C, 1)), \
8if(8is_seq(8H), \
8rec_mc(8T, 8seq_append(8R, 8seq_take(1, 8L)), 8C), \
8rec_mc(8T, 8seq_append(8R, 8seq(8C)), 8D) )))))
ORDER_PP (
8map_count(8seq( 8seq(8(A)), 8true, 8seq(8(C)), 8true, 8true )) //((A))(0)((C))(1)(2)
)
(recurses down the list, leaving sublist elements where they are and replacing non-list elements - represented by 8false - with an incrementing counter variable)
I assume you don't actually want to simply drop __COUNTER__ values at the program toplevel, so if you can place the code into which you need to weave __COUNTER__ values inside a wrapper macro that splits it into some kind of sequence or list, you can then feed the list to a pure function similar to the example.
Of course a metaprogramming library capable of expressing such code is going to be significantly less portable and maintainable than __COUNTER__ anyway. __COUNTER__ is supported by Intel, GCC, Clang and MSVC. (not everyone, e.g. pcc doesn't have it, but does anyone even use that?) Arguably if you demonstrate the feature in use in real code, it makes a stronger case to the standardisation committee that __COUNTER__ should become part of the next C standard.
You are confusing two different things:
1 - the preprocessor which handles#define and #include like stuff. It does only works as the text (meaning character sequences) level and has very few computing capabilities. It is so limited that it cannot implement __COUNTER__. The preprocessor work consist only in macro expansion and file replacement. The crucial point it that it occur before the compilation even start.
2 - the C++ language and in particular the template (meta)programming language which can be used to compute stuff during the compilation phase. It is indeed turing complete but as I already said compilation start after preprocessing.
So what you are asking is not doable in standard C or C++. To solve this problem boost implement its own preprocessor which is not standard compliant and has much more computing capabilities. In particular it is possible to use build an analogue to __counter__ with it.
This small header of mine contains an own implementation of a C preprocessor counter (it uses a slightly different syntax).
This is a nitpicky-details question with three parts. The context is that I wish to persuade some folks that it is safe to use <stddef.h>'s definition of offsetof unconditionally rather than (under some circumstances) rolling their own. The program in question is written entirely in plain old C, so please ignore C++ entirely when answering.
Part 1: When used in the same manner as the standard offsetof, does the expansion of this macro provoke undefined behavior per C89, why or why not, and is it different in C99?
#define offset_of(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
Note: All implementations of interest to the people whose program this is supersede the standard's rule that pointers may only be subtracted from each other when they point into the same array, by defining all pointers, regardless of type or value, to point into a single global address space. Therefore, please do not rely on that rule when arguing that this macro's expansion provokes undefined behavior.
Part 2: To the best of your knowledge, has there ever been a released, production C implementation that, when fed the expansion of the above macro, would (under some circumstances) behave differently than it would have if its offsetof macro had been used instead?
Part 3: To the best of your knowledge, what is the most recently released production C implementation that either did not provide stddef.h or did not provide a working definition of offsetof in that header? Did that implementation claim conformance with any version of the C standard?
For parts 2 and 3, please answer only if you can name a specific implementation and give the date it was released. Answers that state general characteristics of implementations that may qualify are not useful to me.
There is no way to write a portable offsetof macro. You must use the one provided by stddef.h.
Regarding your specific questions:
The macro invokes undefined behavior. You cannot subtract pointers except when they point into the same array.
The big difference in practical behavior is that the macro is not an integer constant expression, so it can't safely be used for static initializers, bitfield widths, etc. Also strict bounds-checking-type C implementations might completely break it.
There has never been any C standard that lacked stddef.h and offsetof. Pre-ANSI compilers might lack it, but they have much more fundamental problems that make them unusable for modern code (e.g. lack of void * and const).
Moreover, even if some theoretical compiler did lack stddef.h, you could just provide a drop-in replacement, just like the way people drop in stdint.h for use with MSVC...
To answer #2: yes, gcc-4* (I'm currently looking at v4.3.4, released 4 Aug 2009, but it should hold true for all gcc-4 releases to date). The following definition is used in their stddef.h:
#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)
where __builtin_offsetof is a compiler builtin like sizeof (that is, it's not implemented as a macro or run-time function). Compiling the code:
#include <stddef.h>
struct testcase {
char array[256];
};
int main (void) {
char buffer[offsetof(struct testcase, array[0])];
return 0;
}
would result in an error using the expansion of the macro that you provided ("size of array ‘buffer’ is not an integral constant-expression") but would work when using the macro provided in stddef.h. Builds using gcc-3 used a macro similar to yours. I suppose that the gcc developers had many of the same concerns regarding undefined behavior, etc that have been expressed here, and created the compiler builtin as a safer alternative to attempting to generate the equivalent operation in C code.
Additional information:
A mailing list thread from the Linux kernel developer's list
GCC's documentation on offsetof
A sort-of-related question on this site
Regarding your other questions: I think R's answer and his subsequent comments do a good job of outlining the relevant sections of the standard as far as question #1 is concerned. As for your third question, I have not heard of a modern C compiler that does not have stddef.h. I certainly wouldn't consider any compiler lacking such a basic standard header as "production". Likewise, if their offsetof implementation didn't work, then the compiler still has work to do before it could be considered "production", just like if other things in stddef.h (like NULL) didn't work. A C compiler released prior to C's standardization might not have these things, but the ANSI C standard is over 20 years old so it's extremely unlikely that you'll encounter one of these.
The whole premise to this problems begs a question: If these people are convinced that they can't trust the version of offsetof that the compiler provides, then what can they trust? Do they trust that NULL is defined correctly? Do they trust that long int is no smaller than a regular int? Do they trust that memcpy works like it's supposed to? Do they roll their own versions of the rest of the C standard library functionality? One of the big reasons for having language standards is so that you can trust the compiler to do these things correctly. It seems silly to trust the compiler for everything else except offsetof.
Update: (in response to your comments)
I think my co-workers behave like yours do :-) Some of our older code still has custom macros defining NULL, VOID, and other things like that since "different compilers may implement them differently" (sigh). Some of this code was written back before C was standardized, and many older developers are still in that mindset even though the C standard clearly says otherwise.
Here's one thing you can do to both prove them wrong and make everyone happy at the same time:
#include <stddef.h>
#ifndef offsetof
#define offsetof(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
#endif
In reality, they'll be using the version provided in stddef.h. The custom version will always be there, however, in case you run into a hypothetical compiler that doesn't define it.
Based on similar conversations that I've had over the years, I think the belief that offsetof isn't part of standard C comes from two places. First, it's a rarely used feature. Developers don't see it very often, so they forget that it even exists. Second, offsetof is not mentioned at all in Kernighan and Ritchie's seminal book "The C Programming Language" (even the most recent edition). The first edition of the book was the unofficial standard before C was standardized, and I often hear people mistakenly referring to that book as THE standard for the language. It's much easier to read than the official standard, so I don't know if I blame them for making it their first point of reference. Regardless of what they believe, however, the standard is clear that offsetof is part of ANSI C (see R's answer for a link).
Here's another way of looking at question #1. The ANSI C standard gives the following definition in section 4.1.5:
offsetof( type, member-designator)
which expands to an integral constant expression that has type size_t,
the value of which is the offset in bytes, to the structure member
(designated by member-designator ), from the beginning of its
structure (designated by type ).
Using the offsetof macro does not invoke undefined behavior. In fact, the behavior is all that the standard actually defines. It's up to the compiler writer to define the offsetof macro such that its behavior follows the standard. Whether it's implemented using a macro, a compiler builtin, or something else, ensuring that it behaves as expected requires the implementor to deeply understand the inner workings of the compiler and how it will interpret the code. The compiler may implement it using a macro like the idiomatic version you provided, but only because they know how the compiler will handle the non-standard code.
On the other hand, the macro expansion you provided indeed invokes undefined behavior. Since you don't know enough about the compiler to predict how it will process the code, you can't guarantee that particular implementation of offsetof will always work. Many people define their own version like that and don't run into problems, but that doesn't mean that the code is correct. Even if that's the way that a particular compiler happens to define offsetof, writing that code yourself invokes UB while using the provided offsetof macro does not.
Rolling your own macro for offsetof can't be done without invoking undefined behavior (ANSI C section A.6.2 "Undefined behavior", 27th bullet point). Using stddef.h's version of offsetof will always produce the behavior defined in the standard (assuming a standards-compliant compiler). I would advise against defining a custom version since it can cause portability problems, but if others can't be persuaded then the #ifndef offsetof snippet provided above may be an acceptable compromise.
(1) The undefined behavior is already there before you do the substraction.
First of all, (tp*)0 is not what you think it is. It is a null
pointer, such a beast is not necessarily represented with all-zero
bit pattern.
Then the member operator -> is not simply an offset addition. On a CPU with segmented memory this might be a more complicated operation.
Taking the address with a & operation is UB if the expression is
not a valid object.
(2) For the point 2., there are certainly still archictures out in the wild (embedded stuff) that use segmented memory. For 3., the point that R makes about integer constant expressions has another drawback: if the code is badly optimized the & operation might be done at runtime and signal an error.
(3) Never heard of such a thing, but this is probably not enough to convice your colleagues.
I believe that nearly every optimizing compiler has broken that macro at multiple points in time. Your coworkers have apparently been lucky enough not to have been hit by it.
What happens is that some junior compiler engineer decides that because the zero page is never mapped on their platform of choice, any time anyone does anything with a pointer to that page, that's undefined behavior and they can safely optimize away the whole expression. At that point, everyone's homebrew offsetof macros break until enough people scream about it, and those of us who were smart enough not to roll our own go happily about our business.
I don't know of any compiler where this is the behavior in the current released version, but I think I've seen it happen at some point with every compiler I've ever worked with.