Reserved words vs standard identifier, how to see the difference - c

So I'm completely new to programming. I currently study computer science and have just read the first 200 pages of my programming book, but there's one thing I cannot seem to see the difference between and which havn't been clearly specified in the book and that's reserved words vs. standard identifiers - how can I see from code if it's one or the other.
I know the reserved words are some that cannot be changed, while the standard indentifiers can (though not recommended according to my book). The problem is while my book says reserved words are always in pure lowercase like,
(int, void, double, return)
it kinda seems to be the very same for standard indentifier like,
(printf, scanf)
so how do I know when it is what, or do I have to learn all the reserved words from the ANSI C, which is the current language we are trying to learn, (or whatever future language I might work with) to know when it is when?

First off, you'll have to learn the rules for each language you learn as it is one of the areas that varies between languages. There's no universal rule about what's what.
Second, in C, you need to know the list of keywords; that seems to be what you're referring to as 'reserved words'. Those are important; they're immutable; they can't be abused because the compiler won't let you. You can't use int as a variable name; it is always a type.
Third, the C preprocessor can be abused to hijack anything; if you compile with #define double int in effect, you get what you deserve, but there's nothing much to stop you doing that.
Fourth, the only predefined variable name is __func__, the name of the current function.
Fifth, names such as printf() are defined by the standard library, but the standard library has to be implemented by someone using a C compiler; ask the maintainers of the GNU C library. For a discussion of many of the ideas behind the treaty between the standard and the compiler writers, and between the compiler writers and the programmers using a compiler, see the excellent book The Standard C Library by P J Plauger from 1992. Yes, it is old and the modern standard C library is somewhat bigger than the one from C90, but the background information is still valid and very helpful.

Reserved words are part of the language's syntax. C without int is not C, but something else. They are built into the language and are not and cannot be defined anywhere in terms of this particular language.
For example, if is a reserved keyword. You can't redefine it and even if you could, how would you do this in terms of the C language? You could do that in assembly, though.
The standard library functions you're talking about are ordinary functions that have been included into the standard library, nothing more. They are defined in terms of the language's syntax. Also, you can redefine these functions, although it's not advised to do so as this may lead to all sorts of bugs and unexpected behavior. Yet it's perfectly valid to write:
int puts(const char *msg) {
printf("This has been monkey-patched!\n");
return -1;
}
You'd get a warning that'd complain about the redefinition of a standard library function, but this code is valid anyway.
Now, imagine reimplementing return:
unknown_type return(unknown_type stuff) {
// what to do here???
}

Related

Is the Syntax of C Language completely defined by CFGs?

I think the Question is self sufficient. Is the syntax of C Language completely defined through Context Free Grammars or do we have Language Constructs which may require non-Context Free definitions in the course of parsing?
An example of non CFL construct i thought was the declaration of variables before their use. But in Compilers(Aho Ullman Sethi), it is stated that the C Language does not distinguish between identifiers on the basis of their names. All the identifiers are tokenized as 'id' by the Lexical Analyzer.
If C is not completely defined by CFGs, please can anyone give an example of Non CFL construct in C?
The problem is that you haven't defined "the syntax of C".
If you define it as the language C in the CS sense, meaning the set of all valid C programs, then C – as well as virtually every other language aside from turing tarpits and Lisp – is not context free. The reasons are not related to the problem of interpreting a C program (e.g. deciding whether a * b; is a multiplication or a declaration). Instead, it's simply because context free grammars can't help you decide whether a given string is a valid C program. Even something as simple as int main() { return 0; } needs a more powerful mechanism than context free grammars, as you have to (1) remember the return type and (2) check that whatever occurs after the return matches the return type. a * b; faces a similar problem – you don't need to know whether it's a multiplication, but if it is a multiplication, that must be a valid operation for the types of a and b. I'm not actually sure whether a context sensitive grammar is enough for all of C, as some restrictions on valid C programs are quite subtle, even if you exclude undefined behaviour (some of which may even be undecidable).
Of course, the above notion is hardly useful. Generally, when talking grammars, we're only interested in a pretty good approximation of a valid program: We want a grammar that rules out as many strings which aren't C as possible without undue complexity in the grammar (for example, 1 a or (-)). Everything else is left to later phases of the compiler and called a semantic error or something similar to distinguish it from the first class of errors. These "approximate" grammars are almost always context free grammars (including in C's case), so if you want to call this approximation of the set of valid programs "syntax", C is indeed defined by a context free grammar. Many people do, so you'd be in good company.
The C language, as defined by the language standard, includes the preprocessor. The following is a syntactically correct C program:
#define START int main(
#define MIDDLE ){
START int argc, char** argv MIDDLE return 0; }
It seems to be really tempting to answer this question (which arises a lot) "sure, there is a CFG for C", based on extracting a subset of the grammar in the standard, which grammar in itself is ambiguous and recognizes a superset of the language. That CFG is interesting and even useful, but it is not C.
In fact, the productions in the standard do not even attempt to describe what a syntactically correct source file is. They describe:
The lexical structure of the source file (along with the lexical structure of valid tokens after pre-processing).
The grammar of individual preprocessor directives
A superset of the grammar of the post-processed language, which relies on some other mechanism to distinguish between typedef-name and other uses of identifier, as well as a mechanism to distinguish between constant-expression and other uses of conditional-expression.
There are many who argue that the issues in point 3 are "semantic", rather than "syntactic". However, the nature of C (and even more so its cousin C++) is that it is impossible to disentangle "semantics" from the parsing of a program. For example, the following is a syntactically correct C program:
#define base 7
#if base * 2 < 10
&one ?= two*}}
#endif
int main(void){ return 0; }
So if you really mean "is the syntax of the C language defined by a CFG", the answer must be no. If you meant, "Is there a CFG which defines the syntax of some language which represents strings which are an intermediate product of the translation of a program in the C language," it's possible that the answer is yes, although some would argue that the necessity to make precise what is a constant-expression and a typedef-name make the syntax necessarily context-sensitive, in a way that other languages are not.
Is the syntax of C Language completely defined through Context Free Grammars?
Yes it is. This is the grammar of C in BNF:
http://www.cs.man.ac.uk/~pjj/bnf/c_syntax.bnf
If you don't see other than exactly one symbol on the left hand side of any rule, then the grammar is context free. That is the very definition of context free grammars (Wikipedia):
In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form
V → w
where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty).
Since ambiguity is mentioned by others, I would like to clarify a bit. Imagine the following grammar:
A -> B x | C x
B -> y
C -> y
This is an ambiguous grammar. However, it is still a context free grammar. These two are completely separate concepts.
Obviously, the semantics analyzer of C is context sensitive. This answer from the duplicate question has further explanations.
There are two things here:
The structure of the language (syntax): this is context free as you do not need to know the surroundings to figure out what is an identifier and what is a function.
The meaning of the program (semantics): this is not context free as you need to know whether an identifier has been declared and with what type when you are referring to it.
If you mean by the "syntax of C" all valid C strings that some C compiler accepts, and after running the pre-processor, but ignoring typing errors, then this is the answer: yes but not unambiguously.
First, you could assume the input program is tokenized according to the C standard. The grammar will describe relations among these tokens and not the bare characters. Such context-free grammars are found in books about C and in implementations that use parser generators. This tokenization is a big assumption because quite some work goes into "lexing" C. So, I would argue that we have not described C with a context-free grammar yet, if we have not used context-free grammars to describe the lexical level. The staging between the tokenizer and the parser combined with the ordering emposed by a scanner generator (prefer keywords, longest match, etc) are a major increase in computational power which is not easily simulated in a context-free grammar.
So, If you do not assume a tokenizer which for example can distinguish type names from variable names using a symbol table, then a context-free grammar is going to be harder. However: the trick here is to accept ambiguity. We can describe the syntax of C including its tokens in a context-free grammar fully. Only the grammar will be ambiguous and produce different interpretations for the same string . For example for A *a; it will have derivations for a multiplication and a pointer declaration both. No problem, the grammar is still describing the C syntax as you requested, just not unambiguously.
Notice that we have assumed having run the pre-processor first as well, I believe your question was not about the code as it looks before pre-processing. Describing that using a context-free grammar would be just madness since syntactic correctness depends on the semantics of expanding user-defined macros. Basically, the programmer is extending the syntax of the C language every time a macro is defined. At CWI we did write context-free grammars for C given a set of known macro definitions to extend the C language and that worked out fine, but that is not a general solution.

macro expansion order with included files

Let's say I have a macro in an inclusion file:
// a.h
#define VALUE SUBSTITUTE
And another file that includes it:
// b.h
#define SUBSTITUTE 3
#include "a.h"
Is it the case that VALUE is now defined to SUBSTITUTE and will be macro expanded in two passes to 3, or is it the case that VALUE has been set to the macro expanded value of SUBSTITUTE (i.e. 3)?
I ask this question in the interest of trying to understand the Boost preprocessor library and how its BOOST_PP_SLOT defines work (edit: and I mean the underlying workings). Therefore, while I am asking the above question, I'd also be interested if anyone could explain that.
(and I guess I'd also like to know where the heck to find the 'painted blue' rules are written...)
VALUE is defined as SUBSTITUTE. The definition of VALUE is not aware at any point that SUBSTITUTE has also been defined. After VALUE is replaced, whatever it was replaced by will be scanned again, and potentially more replacements applied then. All defines exist in their own conceptual space, completely unaware of each other; they only interact with one another at the site of expansion in the main program text (defines are directives, and thus not part of the program proper).
The rules for the preprocessor are specified alongside the rules for C proper in the language standard. The standard documents themselves cost money, but you can usually download the "final draft" for free; the latest (C11) can be found here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
For at-home use the draft is pretty much equivalent to the real thing. Most people who quote the standard are actually looking at copies of the draft. (Certainly it's closer to the actual standard than any real-world C compiler is...)
There's a more accessible description of the macro rules in the GCC manual: http://gcc.gnu.org/onlinedocs/cpp/Self_002dReferential-Macros.html
Additionally... I couldn't tell you much about the Boost preprocessor library, not having used it, but there's a beautiful pair of libraries by the same authors called Order and Chaos that are very "clean" (as macro code goes) and easy to understand. They're more academic in tone and intended to be pure rather than portable; which might make them easier reading.
(Since I don't know Boost PP I don't know how relevant this is to your question but) there's also a good introductory example of the kids of techniques these libraries use for advanced metaprogramming constructs in this answer: Is the C99 preprocessor Turing complete?

Does anyone know the reason why the variables have to be defined at the top of function

I have a question, does anyone know why the variables have to be defined initialized at the beginning of a function? Why can't you initialize or define variables in the middle of a function in C as in C++?
This is a tradition which comes from early C compilers, when compiler needs all local variable definitions before the actual code of function starts (to generate right stack pointer calculation). This was the only way of declaring local variables in early C language, both pre-standard (K&R) and first C standard, C90, published at 1989-1990 ( ANSI X3.159-1989, ISO/IEC 9899:1990 ).
C99 - the 1999 year ISO standard (ISO/IEC 9899:1999) of C allows declaraions in the middle of function.
C++ allows this because it is newer language than C. C++ standards are ISO/IEC 14882:1998 and ISO/IEC 14882:2003, so they are from 1998 and 2003 years.
You can initialize variable (give it a value: a=4;) at the definition point or any time later.
That is a left-over from early C. C99 allows variables to be defined anywhere in the function, including in loop structures.
for (int i = 0; i < 10; ++i) { int j; }
The throwback is from when compilers needed to know the size of the stack for the function before they instantiated the function's code. As compilers became better the requirement just became annoying.
You can do this in C so long as you use a C99 compiler. The restriction was lifted in C99.
Presumably the restriction was originally there because compilers were less capable, or perhaps simply because nobody thought of allowing such a convenience for the programmer.
When answering a question like this it is sometimes worth turning it around. Why would the original designers of C have done it the other way? They didn't have C++ to guide them!
In C++ and C99, you can define variables in the middle of a function. What you cannot do is refer to a variable that has not yet been defined.
From the point of view of object-oriented programming, it wouldn't make much sense otherwise: By defining a variable you bring an object to life (by calling its constructor). Before that, there is no object, so there's nothing to interact with until you pass the point of object construction.
Try writing a compiler with the primitive power of the old times and you'd realize flexibility is something you'd rather kill to get the software to work.
Putting variables first, then other statements simply made their parser/compiler code simpler.

Portability of using stddef.h's offsetof rather than rolling your own

This is a nitpicky-details question with three parts. The context is that I wish to persuade some folks that it is safe to use <stddef.h>'s definition of offsetof unconditionally rather than (under some circumstances) rolling their own. The program in question is written entirely in plain old C, so please ignore C++ entirely when answering.
Part 1: When used in the same manner as the standard offsetof, does the expansion of this macro provoke undefined behavior per C89, why or why not, and is it different in C99?
#define offset_of(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
Note: All implementations of interest to the people whose program this is supersede the standard's rule that pointers may only be subtracted from each other when they point into the same array, by defining all pointers, regardless of type or value, to point into a single global address space. Therefore, please do not rely on that rule when arguing that this macro's expansion provokes undefined behavior.
Part 2: To the best of your knowledge, has there ever been a released, production C implementation that, when fed the expansion of the above macro, would (under some circumstances) behave differently than it would have if its offsetof macro had been used instead?
Part 3: To the best of your knowledge, what is the most recently released production C implementation that either did not provide stddef.h or did not provide a working definition of offsetof in that header? Did that implementation claim conformance with any version of the C standard?
For parts 2 and 3, please answer only if you can name a specific implementation and give the date it was released. Answers that state general characteristics of implementations that may qualify are not useful to me.
There is no way to write a portable offsetof macro. You must use the one provided by stddef.h.
Regarding your specific questions:
The macro invokes undefined behavior. You cannot subtract pointers except when they point into the same array.
The big difference in practical behavior is that the macro is not an integer constant expression, so it can't safely be used for static initializers, bitfield widths, etc. Also strict bounds-checking-type C implementations might completely break it.
There has never been any C standard that lacked stddef.h and offsetof. Pre-ANSI compilers might lack it, but they have much more fundamental problems that make them unusable for modern code (e.g. lack of void * and const).
Moreover, even if some theoretical compiler did lack stddef.h, you could just provide a drop-in replacement, just like the way people drop in stdint.h for use with MSVC...
To answer #2: yes, gcc-4* (I'm currently looking at v4.3.4, released 4 Aug 2009, but it should hold true for all gcc-4 releases to date). The following definition is used in their stddef.h:
#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)
where __builtin_offsetof is a compiler builtin like sizeof (that is, it's not implemented as a macro or run-time function). Compiling the code:
#include <stddef.h>
struct testcase {
char array[256];
};
int main (void) {
char buffer[offsetof(struct testcase, array[0])];
return 0;
}
would result in an error using the expansion of the macro that you provided ("size of array ‘buffer’ is not an integral constant-expression") but would work when using the macro provided in stddef.h. Builds using gcc-3 used a macro similar to yours. I suppose that the gcc developers had many of the same concerns regarding undefined behavior, etc that have been expressed here, and created the compiler builtin as a safer alternative to attempting to generate the equivalent operation in C code.
Additional information:
A mailing list thread from the Linux kernel developer's list
GCC's documentation on offsetof
A sort-of-related question on this site
Regarding your other questions: I think R's answer and his subsequent comments do a good job of outlining the relevant sections of the standard as far as question #1 is concerned. As for your third question, I have not heard of a modern C compiler that does not have stddef.h. I certainly wouldn't consider any compiler lacking such a basic standard header as "production". Likewise, if their offsetof implementation didn't work, then the compiler still has work to do before it could be considered "production", just like if other things in stddef.h (like NULL) didn't work. A C compiler released prior to C's standardization might not have these things, but the ANSI C standard is over 20 years old so it's extremely unlikely that you'll encounter one of these.
The whole premise to this problems begs a question: If these people are convinced that they can't trust the version of offsetof that the compiler provides, then what can they trust? Do they trust that NULL is defined correctly? Do they trust that long int is no smaller than a regular int? Do they trust that memcpy works like it's supposed to? Do they roll their own versions of the rest of the C standard library functionality? One of the big reasons for having language standards is so that you can trust the compiler to do these things correctly. It seems silly to trust the compiler for everything else except offsetof.
Update: (in response to your comments)
I think my co-workers behave like yours do :-) Some of our older code still has custom macros defining NULL, VOID, and other things like that since "different compilers may implement them differently" (sigh). Some of this code was written back before C was standardized, and many older developers are still in that mindset even though the C standard clearly says otherwise.
Here's one thing you can do to both prove them wrong and make everyone happy at the same time:
#include <stddef.h>
#ifndef offsetof
#define offsetof(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
#endif
In reality, they'll be using the version provided in stddef.h. The custom version will always be there, however, in case you run into a hypothetical compiler that doesn't define it.
Based on similar conversations that I've had over the years, I think the belief that offsetof isn't part of standard C comes from two places. First, it's a rarely used feature. Developers don't see it very often, so they forget that it even exists. Second, offsetof is not mentioned at all in Kernighan and Ritchie's seminal book "The C Programming Language" (even the most recent edition). The first edition of the book was the unofficial standard before C was standardized, and I often hear people mistakenly referring to that book as THE standard for the language. It's much easier to read than the official standard, so I don't know if I blame them for making it their first point of reference. Regardless of what they believe, however, the standard is clear that offsetof is part of ANSI C (see R's answer for a link).
Here's another way of looking at question #1. The ANSI C standard gives the following definition in section 4.1.5:
offsetof( type, member-designator)
which expands to an integral constant expression that has type size_t,
the value of which is the offset in bytes, to the structure member
(designated by member-designator ), from the beginning of its
structure (designated by type ).
Using the offsetof macro does not invoke undefined behavior. In fact, the behavior is all that the standard actually defines. It's up to the compiler writer to define the offsetof macro such that its behavior follows the standard. Whether it's implemented using a macro, a compiler builtin, or something else, ensuring that it behaves as expected requires the implementor to deeply understand the inner workings of the compiler and how it will interpret the code. The compiler may implement it using a macro like the idiomatic version you provided, but only because they know how the compiler will handle the non-standard code.
On the other hand, the macro expansion you provided indeed invokes undefined behavior. Since you don't know enough about the compiler to predict how it will process the code, you can't guarantee that particular implementation of offsetof will always work. Many people define their own version like that and don't run into problems, but that doesn't mean that the code is correct. Even if that's the way that a particular compiler happens to define offsetof, writing that code yourself invokes UB while using the provided offsetof macro does not.
Rolling your own macro for offsetof can't be done without invoking undefined behavior (ANSI C section A.6.2 "Undefined behavior", 27th bullet point). Using stddef.h's version of offsetof will always produce the behavior defined in the standard (assuming a standards-compliant compiler). I would advise against defining a custom version since it can cause portability problems, but if others can't be persuaded then the #ifndef offsetof snippet provided above may be an acceptable compromise.
(1) The undefined behavior is already there before you do the substraction.
First of all, (tp*)0 is not what you think it is. It is a null
pointer, such a beast is not necessarily represented with all-zero
bit pattern.
Then the member operator -> is not simply an offset addition. On a CPU with segmented memory this might be a more complicated operation.
Taking the address with a & operation is UB if the expression is
not a valid object.
(2) For the point 2., there are certainly still archictures out in the wild (embedded stuff) that use segmented memory. For 3., the point that R makes about integer constant expressions has another drawback: if the code is badly optimized the & operation might be done at runtime and signal an error.
(3) Never heard of such a thing, but this is probably not enough to convice your colleagues.
I believe that nearly every optimizing compiler has broken that macro at multiple points in time. Your coworkers have apparently been lucky enough not to have been hit by it.
What happens is that some junior compiler engineer decides that because the zero page is never mapped on their platform of choice, any time anyone does anything with a pointer to that page, that's undefined behavior and they can safely optimize away the whole expression. At that point, everyone's homebrew offsetof macros break until enough people scream about it, and those of us who were smart enough not to roll our own go happily about our business.
I don't know of any compiler where this is the behavior in the current released version, but I think I've seen it happen at some point with every compiler I've ever worked with.

What are the major differences between ANSI C and K&R C?

The Wikipedia article on ANSI C says:
One of the aims of the ANSI C standardization process was to produce a superset of K&R C (the first published standard), incorporating many of the unofficial features subsequently introduced. However, the standards committee also included several new features, such as function prototypes (borrowed from the C++ programming language), and a more capable preprocessor. The syntax for parameter declarations was also changed to reflect the C++ style.
That makes me think that there are differences. However, I didn't see a comparison between K&R C and ANSI C. Is there such a document? If not, what are the major differences?
EDIT: I believe the K&R book says "ANSI C" on the cover. At least I believe the version that I have at home does. So perhaps there isn't a difference anymore?
There may be some confusion here about what "K&R C" is. The term refers to the language as documented in the first edition of "The C Programming Language." Roughly speaking: the input language of the Bell Labs C compiler circa 1978.
Kernighan and Ritchie were involved in the ANSI standardization process. The "ANSI C" dialect superceded "K&R C" and subsequent editions of "The C Programming Language" adopt the ANSI conventions. "K&R C" is a "dead language," except to the extent that some compilers still accept legacy code.
Function prototypes were the most obvious change between K&R C and C89, but there were plenty of others. A lot of important work went into standardizing the C library, too. Even though the standard C library was a codification of existing practice, it codified multiple existing practices, which made it more difficult. P.J. Plauger's book, The Standard C Library, is a great reference, and also tells some of the behind-the-scenes details of why the library ended up the way it did.
The ANSI/ISO standard C is very similar to K&R C in most ways. It was intended that most existing C code should build on ANSI compilers without many changes. Crucially, though, in the pre-standard era, the semantics of the language were open to interpretation by each compiler vendor. ANSI C brought in a common description of language semantics which put all the compilers on an equal footing. It's easy to take this for granted now, some 20 years later, but this was a significant achievement.
For the most part, if you don't have a pre-standard C codebase to maintain, you should be glad you don't have to worry about it. If you do--or worse yet, if you're trying to bring an old program up to more modern standards--then you have my sympathies.
There are some minor differences, but I think later editions of K&R are for ANSI C, so there's no real difference anymore.
"C Classic" for lack of a better terms had a slightly different way of defining functions, i.e.
int f( p, q, r )
int p, float q, double r;
{
// Code goes here
}
I believe the other difference was function prototypes. Prototypes didn't have to - in fact they couldn't - take a list of arguments or types. In ANSI C they do.
function prototype.
constant & volatile qualifiers.
wide character support and internationalization.
permit function pointer to be used without dereferencing.
Another difference is that function return types and parameter types did not need to be defined. They would be assumed to be ints.
f(x)
{
return x + 1;
}
and
int f(x)
int x;
{
return x + 1;
}
are identical.
The major differences between ANSI C and K&R C are as follows:
function prototyping
support of the const and volatile data type qualifiers
support wide characters and internationalization
permit function pointers to be used without dereferencing
ANSI C adopts c++ function prototype technique where function definition and declaration include function names,arguments' data types, and return value data types. Function prototype enable ANSI C compiler to check for function calls in user programs that pass invalid numbers of arguments or incompatible arguments data types. These fix major weakness of the K&R C compiler.
Example: to declares a function foo and requires that foo take two arguments
unsigned long foo (char* fmt, double data)
{
/*body of foo */
}
FUNCTION PROTOTYPING:ANSI C adopts c++ function prototype technique where function definaton and declaration include function names,arguments t,data types and return value data types.function prototype enable ANSI ccompilers to check for function call in user program that passes invalid number number of argument or incompatiblle argument data types.these fix a major weakness of the K&R C compilers:invalid call in user program often passes compilation but cause program to crash when they are executed
The difference is:
Prototype
wide character support and internationalisation
Support for const and volatile keywords
permit function pointers to be used as dereferencing
A major difference nobody has yet mentioned is that before ANSI, C was defined largely by precedent rather than specification; in cases where certain operations would have predictable consequences on some platforms but not others (e.g. using relational operators on two unrelated pointers), precedent strongly favored making platform guarantees available to the programmer. For example:
On platforms which define a natural ranking among all pointers to all objects, application of the relational operators to arbitrary pointers could be relied upon to yield that ranking.
On platforms where the natural means of testing whether one pointer is "greater than" another never has any side-effect other than yielding a true or false value, application of the relational operators to arbitrary pointers could likewise be relied upon never to have any side-effects other than yielding a true or false value.
On platforms where two or more integer types shared the same size and representation, a pointer to any such integer type could be relied upon to read or write information of any other type with the same representation.
On two's-complement platforms where integer overflows naturally wrap silently, an operation involving an unsigned values smaller than "int" could be relied upon to behave as though the value was unsigned in cases where the result would be between INT_MAX+1u and UINT_MAX and it was not promoted to a larger type, nor used as the left operand of >>, nor either operand of /, %, or any comparison operator. Incidentally, the rationale for the Standard gives this as one of the reasons small unsigned types promote to signed.
Prior to C89, it was unclear to what lengths compilers for platforms where the above assumptions wouldn't naturally hold might be expected to go to uphold those assumptions anyway, but there was little doubt that compilers for platforms which could easily and cheaply uphold such assumptions should do so. The authors of the C89 Standard didn't bother to expressly say that because:
Compilers whose writers weren't being deliberately obtuse would continue doing such things when practical without having to be told (the rationale given for promoting small unsigned values to signed strongly reinforces this view).
The Standard only required implementations to be capable of running one possibly-contrived program without a stack overflow, and recognized that while an obtuse implementation could treat any other program as invoking Undefined Behavior but didn't think it was worth worrying about obtuse compiler writers writing implementations that were "conforming" but useless.
Although "C89" was interpreted contemporaneously as meaning "the language defined by C89, plus whatever additional features and guarantees the platform provides", the authors of gcc have been pushing an interpretation which excludes any features and guarantees beyond those mandated by C89.
The biggest single difference, I think, is function prototyping and the syntax for describing the types of function arguments.
Despite all the claims to the contary K&R was and is quite capable of providing any sort of stuff from low down close to the hardware on up.
The problem now is to find a compiler (preferably free) that can give a clean compile on a couple of millions of lines of K&R C without out having to mess with it.And running on something like a AMD multi core processor.
As far as I can see, having looked at the source of the GCC 4.x.x series there is no simple hack to reactivate the -traditional and -cpp-traditional lag functionality to their previous working state without without more effor than I am prepered to put in. And simpler to build a K&R pre-ansi compiler from scratch.

Resources