Is there a complete online guide for C format specifiers for every type of data and for all cases? I only found partial and contrasting references that doesn't explain all possible cases.
The definitive guide for this is the actual ISO standard itself. Any other source suffers from the potential flaw that it may be incorrect or incomplete. The standard is, by definition, both correct and complete(a).
And, while standards documents can sometimes be dry and difficult to read, the sections covering the format specifiers is reasonably clear, both in terms of what all the specifiers mean (including flags, width/precision specifiers, and length modifiers), and the data types you're allowed to use with those specifiers.
For example, C11(b) details all the format specifiers in 7.21.6.1 and 7.21.6.2 for the printf and scanf family of functions respectively. The last free draft of this iteration of the standard is the N1570 document.
That is, practically speaking, the C11 standard - officially, it is the latest draft of C11 and, to get the real standard, you need to buy it from the standards body of your country. However, the differences are minor and tend to be administrative in nature.
(a) I don't mean to imply the standard is totally coherent or bug-free, just that it is the standard. That means, pending authorised changes, implementations must follow said standard in order to be considered C. If an implementation does that, it's valid, regardless of what lunacy the standard may have in it :-)
(b) Although C11 (the iteration we use and are therefore most familiar with) may have been officially replaced by C18, the changes were only incorporations of TCs and defect fixes. There were no substantial changes to the "meat" of the standard, in particular for this question, the format specifiers.
Related
Follow-up question for: If "shall / shall not" requirement is violated, then does it matter in which section (e.g. Semantics, Constraints) such requirement is located?.
ISO/IEC 9899:202x (E) working draft— December 11, 2020 N2596, 5.1.1.3 Diagnostics, 1:
A conforming implementation shall produce at least one diagnostic message (identified in an
implementation-defined manner) if a preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is also explicitly specified as undefined or implementation-defined. Diagnostic messages need not be produced in other circumstances.
Consequence: semantics violation does not require diagnostics.
Question: what is the (possible) rationale for "semantics violation does not require diagnostics"?
A possible rationale is given by Rice's theorem : non-trivial semantic properties of programs are undecidable
For example, division by zero is a semantics violation; and you cannot decide, by static analysis alone of the C source code, that it won't happen...
A standard cannot require total detection of such undefined behavior, even if of course some tools (e.g. Frama-C) are sometimes capable of detecting them.
See also the halting problem. You should not expect a C compiler to solve it!
The C99 rationale v5.10 gives this explanation:
5.1.1.3 Diagnostics
By mandating some form of diagnostic message for any program containing a syntax error or
constraint violation, the Standard performs two important services. First, it gives teeth to the
concept of erroneous program, since a conforming implementation must distinguish such a program from a valid one. Second, it severely constrains the nature of extensions permissible to
a conforming implementation.
The Standard says nothing about the nature of the diagnostic message, which could simply be
“syntax error”, with no hint of where the error occurs. (An implementation must, of course,
describe what translator output constitutes a diagnostic message, so that the user can recognize it as such.) The C89 Committee ultimately decided that any diagnostic activity beyond this level is
an issue of quality of implementation, and that market forces would encourage more useful
diagnostics. Nevertheless, the C89 Committee felt that at least some significant class of errors
must be diagnosed, and the class specified should be recognizable by all translators.
This happens because the grammar of the C language is context-sensitive and for all the languages that are defined with context-free or more complex grammars on the Chomsky hierarchy one must do a tradeoff between the semantics of the language and its power.
C designers chose to allow much power for the language and this is why the problem of undecidability is omnipresent in C.
There are languages like Coq that try to cut out the undecidable situations and they restrict the semantics of the recursive functions (they allow only sigma(primitive) recursivity).
The question of whether an implementation provides any useful diagnostics in any particular situation is a Quality of Implementation issue outside the Standard's jurisdiction. If an implementation were to unconditionally output "Warning: this program does not output any useful diagnostics" or even "Warning: water is wet", such output would fully satisfy all of the Standard's requirements with regard to diagnostics even if the implementation didn't output any other diagnostics.
Further, the authors of the Standard characterized as "Undefined Behavior" many actions which they expected would be processed in a meaningful and useful fashion by many if not most implementations. According to the published Rationale document, Undefined Behavior among other things "identifies areas of conforming language extension", since implementations are allowed to specify how they will behave in cases that are not defined by the Standard.
Having implementations issue warnings about constructs which were non-portable, but which they would process in a useful fashion would have been annoying.
Prior to the Standard, some implementations would usefully accept constructs like:
struct foo {
int *p;
char pad [4-sizeof (int*)];
int q,r;
};
for all sizes of pointer up to four bytes (8-byte pointers weren't a thing back then), rather than squawking if pointers were exactly four bytes, but some people on the Committee were opposed to the idea of accepting declarations for zero-sized arrays. Thus, a compromise was reached where compilers would squawk about such things, programmers would ignore the useless warnings, and the useful constructs would remain usable on implementations that supported them.
While there was a vague attempt to distinguish between constructs that should produce warnings that programmers could ignore, versus constructs that might be used so much that warnings would be annoying, the fact that issuance of useful diagnostics was a Quality of Implementation issue outside the Standard's jurisdiction meant there was no real need to worry too much about such distinctions.
So I have come across following definition in one of Wikipedia articles (rough translation):
Modifier (programming) - element of source code being a phrase of given programming language construct, which results in changed behavior of given construct.
Then, the article mentions modifiers in regard to ANSI C standard:
type modifiers (sign - signed unsigned, constness const, volatility volatile)
Then it also mentions the term in regard to languages such as Turbo C, Borland, Perl, but given there is no mention of modifier in ANSI/ISO 9899, this already puts validity of article into doubt.
Answers to this question draw similar conclusions.
However, when looking at some of the top searches on google, you get the term modifier mentioned everywhere around in tutorial sections, or even example interview questions.
So the question is: Can the usage of the term modifier in this context be justified or rather requires correction when mentioned?
Can the usage of the term modifier in this context be justified or rather requires correction when mentioned?
The C spec does not use "modifier" with a specific definition. It does discuss how things are modifiable, etc. and details the term modifiable lvalue, but nothing that ties to OP's concerns about signed, unsigned, const, volatile.
In C, const, volatile, and restrict are type-qualifiers.
signed, unsigned are 2 of the standard integer types.
So the authoritative reference is silent on "usage of the term modifier".
Lacking a standard reference answer, it does make sense, when using the term modifier, to justify its context to avoid quibbling corrections.
Like many terms that span multiple languages, the reader needs to understand the terms are used loosely when applied so broadly. Each computer language has and needs very precise terms. When speaking C, best to avoid the term unless a generality is needed in context with other languages.
According to the C Standard, subclause 6.10.2, paragraph 5 [ISO/IEC 9899:2011],
The implementation shall provide unique mappings for sequences
consisting of one or more nondigits or digits (6.4.2.1) followed by a
period (.) and a single nondigit. The first character shall not be a
digit. The implementation may ignore distinctions of alphabetical case
and restrict the mapping to eight significant characters before the
period.
This would mean that if two include files have first 8 characters in common, the header it actually picks is undefined.
When I compile using clang or gcc, I haven't really faced this issue. However, is there a documented behavior for source file inclusion in GCC and Clang?
In the modern world, I would find it weird if any compiler really restricts to 8 characters.
Reference: C11 WG14 draft version N1570, Cert C Coding standard
This would mean that if two include files have first 8 characters in common, the header it actually picks is undefined.
No, I'd argue against that: Looking at the exact wording we see that standard uses:
[..] The implementation may ignore [..]
It's "may", not "shall". If the later was used it would indeed mean that the behavior was undefined (N1570 $4/2). Since "may" is used as-is, without exact declaration I think it's safe to assume the normal meaning of the word (source, emphasis mine):
used to express opportunity or permission
Thus, an implementation is allowed to only consider the first 8 characters, but it doesn't have to.
Funny thing: I cannot find an exact documentation for the "distinction limit" of the "sequence" in GCC's manual, meaning (N1570 $4/8, emphasis mine) ...
An implementation shall be accompanied by a document that defines all implementation defined and locale-specific characteristics and all extensions.
... that GCC could (under some very pedantic point of view) be considered a nonconforming implementation. The practical relevant part of their manual, as #PaulGriffiths pointed out, is probably (source, point 4 in the list):
Significant initial characters in an identifier or macro name.
The preprocessor treats all characters as significant. The C standard requires only that the first 63 be significant.
Regarding the comment:
[..] I am actually trying to evaluate if this will bite me as long as I am using one of these compilers on a Linux platform. [..]
I really doubt that this will ever (again?) be an issue.
In this rule you have to go to ISO/IEC 9899:1990 Appendix G and study each case of Implementation defined behavior to document them.
It's a difficult task to determine what are the manual checks to do in the code.
Is there some kind of list of manual checks to do because of this rule?
MISRA-C is primarily concerned with avoiding unpredictable behavior in the C language, those “traps and pitfalls” (such as undefined and unspecified behavior) all C developers should be aware of that a compiler will not always warn you about. This includes implementation-defined behavior, where the C standard specifies the behavior of certain constructs after compilation can vary. These tend to be less critical from a safety point of view, provided the compiler documentation describes its intended behavior as required by the standard.
That is, for each specific compiler the behavior is well-defined, but the concern is to assure the developers have verified this, including documenting language extensions, known bugs in the compiler (and build chain) and workarounds.
Although it is possible to manually check C code fully for MISRA-C compliancy, it is not recommended. The guidelines were developed with static analysis tools in mind. Not all guidelines can be fully checked by tools, but the better MISRA-C tools (be careful in your evaluations, there are not many “good” ones), will at least assist where it can identify automatically where code relies on implementation-specific behavior. This includes all the checks required in Rule 3.1., where implementation-defined behavior cannot be completely checked by a tool, then a manual review will be required.
Also, if you are starting a new MISRA-C project, I highly recommend referring to MISRA-C:2012, even if you are required to be MISRA-C:2004 compliant. Having MISRA-C:2012 around helps, because it has clarified many of the guidelines, including additional rationale, explanations and examples. The standard (which can be obtained at misra-c.com ) lists the C90 and C99 implementation-defined behaviors that are considered to have the potential to cause unintended behavior. This may or may not overlap with guidelines that address implementation-defined behaviors that MISRA-C is specifically concerned about.
First of all, the standard definition of implementation-defined behavior is: specific behavior which the compiler must document. So you can always refer to the compiler documentation whenever there is a need to document how a certain implementation-defined behavior is implemented.
What's left to you to do then is to document where the code relies on implementation-defined behavior. This is preferably done in source code comments.
Spontaneously, here are the most important things which you need to look for in the code. The list is not including those cases that are already covered by other MISRA rules (for example signedness of char).
The size of all the integer types. The size of int being most important, as it determines which type that is given to integer literals, C "boolean" expressions, implicitly promoted integers etc etc.
Obscure integer formats that aren't standard two's complement.
Any reliance on endianess.
The enum type format.
The floating point format.
Pointer to integer conversions, in case they are obscure on the given system.
Behavior of function inlining and the register keyword, if these are used.
Alignment issues including struct padding. Reliance on the size of a struct/union.
#include paths, in case they are obscure. Particularly if they are absolute and not relative.
Bitwise operators mixed with signed types (in most cases this is a bug or design mistake).
C99 still isn't supported by many compilers, and much of the focus is now on C++, and its upcoming standard C++1x.
I'm curious as to what C will "get" in its next standard, when it will get it, and how it will keep C competitive. C and C++ are known to feed on one another's improvements, will C be feeding on the C++1x standard?
What can I look forward to in C's future?
The ISO/IEC 9899:2011 standard, aka C11, was published in December 2011.
The latest draft is N1570; I'm not aware of any differences between it and the final standard. There's already a Technical Corrigendum fixing an oversight in the specification of __STDC_VERSION__ (now 201112L) and the optional __STDC_LIB_EXT1__ (now 201112L).
I was typing a list of of features, but noticed the Wikipedia page on C1X has a pretty complete listing of all proposed changes.
On the ISO C working group posts 'after meeting' mailings on their website. One of the more interesting is this Editor's Report.
Here's a summary from the Wikipedia page:
Alignment specification (_Align specifier, alignof operator, aligned_alloc function)
Multithreading support (_Thread_local storage-class specifier, <threads.h> header including thread creation/management functions, mutex, condition variable and thread-specific storage functionality)
Improved Unicode support (char16_t and char32_t types for storing UTF-16/UTF-32 encoded data, including the corresponding u and U string literal prefixes and conversion functions in <uchar.h>)
Removal of the gets function
Bounds-checking interfaces (Annex K)
Analyzability features (Annex L)
I looks like gcc as of 4.6 is starting to look at C1x. They claim to have:
Static assertions (_Static_assert keyword)
Typedef redefinition
New macros in <float.h>
Anonymous structures and unions
Probably the best place to find the current status would be to look at the latest draft of the new version of the C standard. Warning: though it's coming directly from the committee, the server behind that link isn't always the most responsive...