C99 still isn't supported by many compilers, and much of the focus is now on C++, and its upcoming standard C++1x.
I'm curious as to what C will "get" in its next standard, when it will get it, and how it will keep C competitive. C and C++ are known to feed on one another's improvements, will C be feeding on the C++1x standard?
What can I look forward to in C's future?
The ISO/IEC 9899:2011 standard, aka C11, was published in December 2011.
The latest draft is N1570; I'm not aware of any differences between it and the final standard. There's already a Technical Corrigendum fixing an oversight in the specification of __STDC_VERSION__ (now 201112L) and the optional __STDC_LIB_EXT1__ (now 201112L).
I was typing a list of of features, but noticed the Wikipedia page on C1X has a pretty complete listing of all proposed changes.
On the ISO C working group posts 'after meeting' mailings on their website. One of the more interesting is this Editor's Report.
Here's a summary from the Wikipedia page:
Alignment specification (_Align specifier, alignof operator, aligned_alloc function)
Multithreading support (_Thread_local storage-class specifier, <threads.h> header including thread creation/management functions, mutex, condition variable and thread-specific storage functionality)
Improved Unicode support (char16_t and char32_t types for storing UTF-16/UTF-32 encoded data, including the corresponding u and U string literal prefixes and conversion functions in <uchar.h>)
Removal of the gets function
Bounds-checking interfaces (Annex K)
Analyzability features (Annex L)
I looks like gcc as of 4.6 is starting to look at C1x. They claim to have:
Static assertions (_Static_assert keyword)
Typedef redefinition
New macros in <float.h>
Anonymous structures and unions
Probably the best place to find the current status would be to look at the latest draft of the new version of the C standard. Warning: though it's coming directly from the committee, the server behind that link isn't always the most responsive...
Related
In 1990 P.J. Plauger wrote (emphasis added):
Standard C offers you an additional level of security, however. It is a level
offered in no other language standard that I know. It promises that if you avoid
certain sets of names, you will experience no collisions. Thus, Standard C makes
it that much easier to write highly portable applications.
In C11 keyword _Alignas (for example) was introduced with the accompanying macro alignas defined in <stdalign.h>. Here we see that the keyword is _Alignas, not alignas (since in pre-C11 alignas is not reserved). Hence, there is no collision with possible user-defined alignas.
However, in C2x the alignas is a keyword and <stdalign.h> provides no content (and C2x says nothing about __alignas_is_defined macro -- defect?). It means that in C2x any pre-C2x code containing user-defined alignas will cause semantics violation and, hence, breaks backward compatibility.
Questions:
Does it mean that since C2x the "you will experience no collisions" does not hold any longer?
What is the rationale for alignas (for example) to be a keyword rather than a macro?
The proposal for these change (available at https://open-std.org/JTC1/SC22/WG14/www/docs/n2934.pdf) argues that the naming strategy in regards to new keywords has already been inconsistent with previous standard versions:
some were integrated using non-reserved names (const, inline) others
were integrated in an underscore-capitalized form. For some of them, the use of the lower-case form then
is ensured via a set of library header files.
Further using the same keyword naming as C++ (for compatibility purposes) is also mentioned in the proposal, since some keywords originated in that language and were later added to C.
Is there a complete online guide for C format specifiers for every type of data and for all cases? I only found partial and contrasting references that doesn't explain all possible cases.
The definitive guide for this is the actual ISO standard itself. Any other source suffers from the potential flaw that it may be incorrect or incomplete. The standard is, by definition, both correct and complete(a).
And, while standards documents can sometimes be dry and difficult to read, the sections covering the format specifiers is reasonably clear, both in terms of what all the specifiers mean (including flags, width/precision specifiers, and length modifiers), and the data types you're allowed to use with those specifiers.
For example, C11(b) details all the format specifiers in 7.21.6.1 and 7.21.6.2 for the printf and scanf family of functions respectively. The last free draft of this iteration of the standard is the N1570 document.
That is, practically speaking, the C11 standard - officially, it is the latest draft of C11 and, to get the real standard, you need to buy it from the standards body of your country. However, the differences are minor and tend to be administrative in nature.
(a) I don't mean to imply the standard is totally coherent or bug-free, just that it is the standard. That means, pending authorised changes, implementations must follow said standard in order to be considered C. If an implementation does that, it's valid, regardless of what lunacy the standard may have in it :-)
(b) Although C11 (the iteration we use and are therefore most familiar with) may have been officially replaced by C18, the changes were only incorporations of TCs and defect fixes. There were no substantial changes to the "meat" of the standard, in particular for this question, the format specifiers.
According to the C Standard, subclause 6.10.2, paragraph 5 [ISO/IEC 9899:2011],
The implementation shall provide unique mappings for sequences
consisting of one or more nondigits or digits (6.4.2.1) followed by a
period (.) and a single nondigit. The first character shall not be a
digit. The implementation may ignore distinctions of alphabetical case
and restrict the mapping to eight significant characters before the
period.
This would mean that if two include files have first 8 characters in common, the header it actually picks is undefined.
When I compile using clang or gcc, I haven't really faced this issue. However, is there a documented behavior for source file inclusion in GCC and Clang?
In the modern world, I would find it weird if any compiler really restricts to 8 characters.
Reference: C11 WG14 draft version N1570, Cert C Coding standard
This would mean that if two include files have first 8 characters in common, the header it actually picks is undefined.
No, I'd argue against that: Looking at the exact wording we see that standard uses:
[..] The implementation may ignore [..]
It's "may", not "shall". If the later was used it would indeed mean that the behavior was undefined (N1570 $4/2). Since "may" is used as-is, without exact declaration I think it's safe to assume the normal meaning of the word (source, emphasis mine):
used to express opportunity or permission
Thus, an implementation is allowed to only consider the first 8 characters, but it doesn't have to.
Funny thing: I cannot find an exact documentation for the "distinction limit" of the "sequence" in GCC's manual, meaning (N1570 $4/8, emphasis mine) ...
An implementation shall be accompanied by a document that defines all implementation defined and locale-specific characteristics and all extensions.
... that GCC could (under some very pedantic point of view) be considered a nonconforming implementation. The practical relevant part of their manual, as #PaulGriffiths pointed out, is probably (source, point 4 in the list):
Significant initial characters in an identifier or macro name.
The preprocessor treats all characters as significant. The C standard requires only that the first 63 be significant.
Regarding the comment:
[..] I am actually trying to evaluate if this will bite me as long as I am using one of these compilers on a Linux platform. [..]
I really doubt that this will ever (again?) be an issue.
Is there a difference if I compile the following program using c89 vs c99? I get the same output. Is there really a difference between the two?
#include <stdio.h>
int main ()
{
// Print string to screen.
printf ("Hello World\n");
}
gcc -o helloworld -std=c99 helloworld.c
vs
gcc -o helloworld -std=c89 helloworld.c
// comments are not a part of C89 but are OK in C99,
falling off of main() without returning any value is equivalent to return 0; in C99, but not so in C89. From N1256 (pdf), 5.1.2.2.3p1:
If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0.
So your code has undefined behavior in C89, and well-defined behavior in C99.
In theory, there should be one difference. Using "//" to demark a comment isn't part of C89, so if it enforced the C89 rules correctly, that would produce a compiler error (with -ansi -pedantic, it might do that, but I don't remember for sure).
That gives an idea of the general character though: if a program compiles as C89, it'll generally also compile as C99, and give exactly the same results. C99 mostly buys you some new features that aren't present in C89, so you can use (for example) variable length arrays, which aren't allowed in C89.
You may have to ask for pedantic rules enforcement to see all the differences though -- C99 is intended to standardize existing practice, and some of the existing practice is gcc extensions, some of which are enabled by default.
on this forum http://www.velocityreviews.com/forums/t287495-p2-iso-c89-and-iso-c99.html i found this:
summary: 99 is standardized, has new keywords, new array stuff, complex numbers, library functions and such. More compilers are c89 complete since they've had all this time to make them so.
A) ANSI X3.159-1989. This is the original 1989 C standard, dated
December 1989, with Rationale. The main body of the language is
described in section 3, and the "C library" -- stdio,
functions, and so on -- in section 4.
B) ISO 9899:1990. This is the original ISO C standard. "ANSI" is the
American National Standards Institute, so the international crowd have
to have their own standards with their own, different, numbering
system. They simply adopted ANSI's 1989 standard, removed the
Rationale, and renumbered the sections (calling them "clauses"
instead). With very few exceptions you can just add three, so that
most of the language is described in section
-- er, "clause" -- 6, and the "C library" part in section 7.
C) ISO 9899:1999. This is the newfangled "C99" standard, with its
Variable Length Arrays, Flexible Array Members, new keywords like
"restrict" and "_Bool", new semantics for the "static" keyword, new
syntax to create anonymous aggregates, new complex-number types,
hundreds of new library functions, and so on.
The new ISO standard was immediately "back-adopted" by ANSI. I have
not seen any official "ANSI-sanctioned" claim about this, but given
the usual numbering systems, I would expect this to be ANSI Standard
number X3.159-1999. (The numbering system is pretty obvious: a
standard, once it comes out, gets a number -- X. for
ANSI, or just a number for ISO -- and a suffix indicating year of
publication. An update to an existing standard reuses the number, with
the new year.)
Although X3.159-1989 and 9899:1990 have different years and section
numbering, they are effectively identical, so "C89" and "C90" really
refer to the same language. Hence you can say either "C89" or "C90"
and mean the same thing, even to those aware of all the subtleties.
There were also several small revisions to the original 1990 ISO
standard: "Normative Addendum 1", and two "Technical Corrigenda"
(numbered; giving Technical Corrigendum 1 and TC2). The two TCs are
considered to be "bug fixes" for glitches in the wording of the
standard, while NA1 is an actual "change". In practice, the TCs do not
really affect users, while NA1 adds a whole slew of functions that
people can use, so NA1 really is more significant. NA1 came out in
1994, so one might refer to "ISO 9899:1990 as modified by NA1" as
"C94". I have seen it called "C95", too.
The Wikipedia article on ANSI C says:
One of the aims of the ANSI C standardization process was to produce a superset of K&R C (the first published standard), incorporating many of the unofficial features subsequently introduced. However, the standards committee also included several new features, such as function prototypes (borrowed from the C++ programming language), and a more capable preprocessor. The syntax for parameter declarations was also changed to reflect the C++ style.
That makes me think that there are differences. However, I didn't see a comparison between K&R C and ANSI C. Is there such a document? If not, what are the major differences?
EDIT: I believe the K&R book says "ANSI C" on the cover. At least I believe the version that I have at home does. So perhaps there isn't a difference anymore?
There may be some confusion here about what "K&R C" is. The term refers to the language as documented in the first edition of "The C Programming Language." Roughly speaking: the input language of the Bell Labs C compiler circa 1978.
Kernighan and Ritchie were involved in the ANSI standardization process. The "ANSI C" dialect superceded "K&R C" and subsequent editions of "The C Programming Language" adopt the ANSI conventions. "K&R C" is a "dead language," except to the extent that some compilers still accept legacy code.
Function prototypes were the most obvious change between K&R C and C89, but there were plenty of others. A lot of important work went into standardizing the C library, too. Even though the standard C library was a codification of existing practice, it codified multiple existing practices, which made it more difficult. P.J. Plauger's book, The Standard C Library, is a great reference, and also tells some of the behind-the-scenes details of why the library ended up the way it did.
The ANSI/ISO standard C is very similar to K&R C in most ways. It was intended that most existing C code should build on ANSI compilers without many changes. Crucially, though, in the pre-standard era, the semantics of the language were open to interpretation by each compiler vendor. ANSI C brought in a common description of language semantics which put all the compilers on an equal footing. It's easy to take this for granted now, some 20 years later, but this was a significant achievement.
For the most part, if you don't have a pre-standard C codebase to maintain, you should be glad you don't have to worry about it. If you do--or worse yet, if you're trying to bring an old program up to more modern standards--then you have my sympathies.
There are some minor differences, but I think later editions of K&R are for ANSI C, so there's no real difference anymore.
"C Classic" for lack of a better terms had a slightly different way of defining functions, i.e.
int f( p, q, r )
int p, float q, double r;
{
// Code goes here
}
I believe the other difference was function prototypes. Prototypes didn't have to - in fact they couldn't - take a list of arguments or types. In ANSI C they do.
function prototype.
constant & volatile qualifiers.
wide character support and internationalization.
permit function pointer to be used without dereferencing.
Another difference is that function return types and parameter types did not need to be defined. They would be assumed to be ints.
f(x)
{
return x + 1;
}
and
int f(x)
int x;
{
return x + 1;
}
are identical.
The major differences between ANSI C and K&R C are as follows:
function prototyping
support of the const and volatile data type qualifiers
support wide characters and internationalization
permit function pointers to be used without dereferencing
ANSI C adopts c++ function prototype technique where function definition and declaration include function names,arguments' data types, and return value data types. Function prototype enable ANSI C compiler to check for function calls in user programs that pass invalid numbers of arguments or incompatible arguments data types. These fix major weakness of the K&R C compiler.
Example: to declares a function foo and requires that foo take two arguments
unsigned long foo (char* fmt, double data)
{
/*body of foo */
}
FUNCTION PROTOTYPING:ANSI C adopts c++ function prototype technique where function definaton and declaration include function names,arguments t,data types and return value data types.function prototype enable ANSI ccompilers to check for function call in user program that passes invalid number number of argument or incompatiblle argument data types.these fix a major weakness of the K&R C compilers:invalid call in user program often passes compilation but cause program to crash when they are executed
The difference is:
Prototype
wide character support and internationalisation
Support for const and volatile keywords
permit function pointers to be used as dereferencing
A major difference nobody has yet mentioned is that before ANSI, C was defined largely by precedent rather than specification; in cases where certain operations would have predictable consequences on some platforms but not others (e.g. using relational operators on two unrelated pointers), precedent strongly favored making platform guarantees available to the programmer. For example:
On platforms which define a natural ranking among all pointers to all objects, application of the relational operators to arbitrary pointers could be relied upon to yield that ranking.
On platforms where the natural means of testing whether one pointer is "greater than" another never has any side-effect other than yielding a true or false value, application of the relational operators to arbitrary pointers could likewise be relied upon never to have any side-effects other than yielding a true or false value.
On platforms where two or more integer types shared the same size and representation, a pointer to any such integer type could be relied upon to read or write information of any other type with the same representation.
On two's-complement platforms where integer overflows naturally wrap silently, an operation involving an unsigned values smaller than "int" could be relied upon to behave as though the value was unsigned in cases where the result would be between INT_MAX+1u and UINT_MAX and it was not promoted to a larger type, nor used as the left operand of >>, nor either operand of /, %, or any comparison operator. Incidentally, the rationale for the Standard gives this as one of the reasons small unsigned types promote to signed.
Prior to C89, it was unclear to what lengths compilers for platforms where the above assumptions wouldn't naturally hold might be expected to go to uphold those assumptions anyway, but there was little doubt that compilers for platforms which could easily and cheaply uphold such assumptions should do so. The authors of the C89 Standard didn't bother to expressly say that because:
Compilers whose writers weren't being deliberately obtuse would continue doing such things when practical without having to be told (the rationale given for promoting small unsigned values to signed strongly reinforces this view).
The Standard only required implementations to be capable of running one possibly-contrived program without a stack overflow, and recognized that while an obtuse implementation could treat any other program as invoking Undefined Behavior but didn't think it was worth worrying about obtuse compiler writers writing implementations that were "conforming" but useless.
Although "C89" was interpreted contemporaneously as meaning "the language defined by C89, plus whatever additional features and guarantees the platform provides", the authors of gcc have been pushing an interpretation which excludes any features and guarantees beyond those mandated by C89.
The biggest single difference, I think, is function prototyping and the syntax for describing the types of function arguments.
Despite all the claims to the contary K&R was and is quite capable of providing any sort of stuff from low down close to the hardware on up.
The problem now is to find a compiler (preferably free) that can give a clean compile on a couple of millions of lines of K&R C without out having to mess with it.And running on something like a AMD multi core processor.
As far as I can see, having looked at the source of the GCC 4.x.x series there is no simple hack to reactivate the -traditional and -cpp-traditional lag functionality to their previous working state without without more effor than I am prepered to put in. And simpler to build a K&R pre-ansi compiler from scratch.