So I have come across following definition in one of Wikipedia articles (rough translation):
Modifier (programming) - element of source code being a phrase of given programming language construct, which results in changed behavior of given construct.
Then, the article mentions modifiers in regard to ANSI C standard:
type modifiers (sign - signed unsigned, constness const, volatility volatile)
Then it also mentions the term in regard to languages such as Turbo C, Borland, Perl, but given there is no mention of modifier in ANSI/ISO 9899, this already puts validity of article into doubt.
Answers to this question draw similar conclusions.
However, when looking at some of the top searches on google, you get the term modifier mentioned everywhere around in tutorial sections, or even example interview questions.
So the question is: Can the usage of the term modifier in this context be justified or rather requires correction when mentioned?
Can the usage of the term modifier in this context be justified or rather requires correction when mentioned?
The C spec does not use "modifier" with a specific definition. It does discuss how things are modifiable, etc. and details the term modifiable lvalue, but nothing that ties to OP's concerns about signed, unsigned, const, volatile.
In C, const, volatile, and restrict are type-qualifiers.
signed, unsigned are 2 of the standard integer types.
So the authoritative reference is silent on "usage of the term modifier".
Lacking a standard reference answer, it does make sense, when using the term modifier, to justify its context to avoid quibbling corrections.
Like many terms that span multiple languages, the reader needs to understand the terms are used loosely when applied so broadly. Each computer language has and needs very precise terms. When speaking C, best to avoid the term unless a generality is needed in context with other languages.
Related
The ANSI C grammar specifies:
declarator:
pointer_opt direct-declarator
direct-declarator:
identifier
( declarator )
direct-declarator [ constant-expression_opt ]
direct-declarator ( parameter-type-list )
direct-declarator ( identifier-list_opt )
According to this grammar, it would be possible to derive
func()()
as a declarator, and
int func()()
as a declaration, which is semantically illegal. Why does the C grammar allow such syntactically legal, but sementically illegal declarations?
These kinds of questions typically can't be answered for certain, because you're asking for information about the collective thoughts and deliberations of the C committee, in 1989. They've never conducted the work of language development wholly in public, the way, say, the people responsible for Python do, and thirty years ago they did that even less. And if you polled them personally, they probably wouldn't remember.
We can look at the C Rationale document (I'm linking to the edition corresponding to C1999, but as far as I know it didn't change very much since 1989) for clues, but on a quick skim, I don't see anything relevant to your question.
That leaves me making guesses based on general principles of programming language design. There is a general principle relevant to your question: Particularly for older languages, designers try to make the formal syntax be context-free as much as possible. This makes it much easier to write an efficient parser. Rules like "you can't have a function that returns a function" require context, and so they are left out of the syntax. It's straightforward to handle them as post-hoc constraints applied to the parse tree instead, so that's what designers do.
The C grammar has a whole bunch of places where this principle appears to have been used, not just the one you're asking about. For instance, the "maximal munch" rule for tokenization exists because it means the tokenizer does not need to be aware of the full parser context, even though it leads to inconvenient results, such as a-----b being interpreted as a -- -- - b instead of a -- - -- b, even though the parser will reject the former but accept the latter.
This design principle for programming languages is often surprising to beginners, because it's so different from how humans understand natural languages; we will go out of our way to "repair" some kind of contextually appropriate meaning from even the most nonsensical sentences, and we actually rely on this in conversation. It might help to contemplate the meta-principle that worse is better (to oversimplify, because you can get the first 90% of the work done quickly and put it out there and then iterate on the remaining 90%).
Why does the C grammar allow syntactically legal, but semantically illegal declarations like int func()()?
Your question basically answers itself:
Quite simply, it's because it's a grammar's whole job to accept syntactically legal constructs. If something is syntactically legal, but semantically meaningless or illegal, it's not the grammar's job to reject it -- it gets rejected later, during semantic analysis.
And if the question is, "Why wasn't the grammar written differently, so that semantically illegal constructs were also syntactically illegal (such that the grammar could reject them)?", the answer is that it's often a tradeoff whether to reject things during parsing or during semantic analysis. C's declaration syntax is pretty complicated, and there's an obvious desire to make the grammar which accepts it about as complicated as, but not significantly more complicated than, it has to be. Often, you can keep a grammar nicely simple by deferring certain checks to the semantic analysis phase.
Why does the C grammar allow such syntactically legal, but sementically illegal declarations?
What makes you think it sensible to expect the language syntax to be unable to express any semantically incorrect statements?
Not all semantic problems can even be detected at compile time (example: y = 1 / x;, which is well-defined except when x is zero). Even formulating the syntax rules so that they do not accept any statements, declarations, or expressions that can be proven semantically wrong at compile time would be of little benefit. It would complicate the syntax rules tremendously for very little gain, as compilers have to do the semantic analysis either way.
Note well that the primary audience for the language standard is people, not machines. That's why it describes the language semantics with prose.
In this rule you have to go to ISO/IEC 9899:1990 Appendix G and study each case of Implementation defined behavior to document them.
It's a difficult task to determine what are the manual checks to do in the code.
Is there some kind of list of manual checks to do because of this rule?
MISRA-C is primarily concerned with avoiding unpredictable behavior in the C language, those “traps and pitfalls” (such as undefined and unspecified behavior) all C developers should be aware of that a compiler will not always warn you about. This includes implementation-defined behavior, where the C standard specifies the behavior of certain constructs after compilation can vary. These tend to be less critical from a safety point of view, provided the compiler documentation describes its intended behavior as required by the standard.
That is, for each specific compiler the behavior is well-defined, but the concern is to assure the developers have verified this, including documenting language extensions, known bugs in the compiler (and build chain) and workarounds.
Although it is possible to manually check C code fully for MISRA-C compliancy, it is not recommended. The guidelines were developed with static analysis tools in mind. Not all guidelines can be fully checked by tools, but the better MISRA-C tools (be careful in your evaluations, there are not many “good” ones), will at least assist where it can identify automatically where code relies on implementation-specific behavior. This includes all the checks required in Rule 3.1., where implementation-defined behavior cannot be completely checked by a tool, then a manual review will be required.
Also, if you are starting a new MISRA-C project, I highly recommend referring to MISRA-C:2012, even if you are required to be MISRA-C:2004 compliant. Having MISRA-C:2012 around helps, because it has clarified many of the guidelines, including additional rationale, explanations and examples. The standard (which can be obtained at misra-c.com ) lists the C90 and C99 implementation-defined behaviors that are considered to have the potential to cause unintended behavior. This may or may not overlap with guidelines that address implementation-defined behaviors that MISRA-C is specifically concerned about.
First of all, the standard definition of implementation-defined behavior is: specific behavior which the compiler must document. So you can always refer to the compiler documentation whenever there is a need to document how a certain implementation-defined behavior is implemented.
What's left to you to do then is to document where the code relies on implementation-defined behavior. This is preferably done in source code comments.
Spontaneously, here are the most important things which you need to look for in the code. The list is not including those cases that are already covered by other MISRA rules (for example signedness of char).
The size of all the integer types. The size of int being most important, as it determines which type that is given to integer literals, C "boolean" expressions, implicitly promoted integers etc etc.
Obscure integer formats that aren't standard two's complement.
Any reliance on endianess.
The enum type format.
The floating point format.
Pointer to integer conversions, in case they are obscure on the given system.
Behavior of function inlining and the register keyword, if these are used.
Alignment issues including struct padding. Reliance on the size of a struct/union.
#include paths, in case they are obscure. Particularly if they are absolute and not relative.
Bitwise operators mixed with signed types (in most cases this is a bug or design mistake).
I've finally taken an interest in some C99 features, and now I'm having trouble understanding the relevant sections of the C99 draft.
I know that restrict is a promise that two restrict qualified pointers will not point to the same object, but my quest to find a more verbose and concrete explanation of what is and is not allowed has turned up little.
So my question is:
Can someone provide a readable, understandable explanation of the details about restrict pointers, e.g. when I can and cannot use them, when it's UB, etc. The more verbose the better. I'm tired of making my head hurt looking at the C99 draft.
Thanks.
here is an excerpt from:
http://en.wikipedia.org/wiki/Restrict
regarding the 'restrict' modifier
"In the C programming language, as of the C99 standard, restrict is a keyword that can be used in pointer declarations. The restrict keyword is a declaration of intent given by the programmer to the compiler. It says that for the lifetime of the pointer, only it or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points. This limits the effects of pointer aliasing, aiding optimizations. If the declaration of intent is not followed and the object is accessed by an independent pointer, this will result in undefined behavior. The use of the restrict keyword in C, in principle, allows non-obtuse C to achieve the same performance as the same program written in Fortran.[1]"
I have read several pages, including the wiki page http://en.wikipedia.org/wiki/Strong_and_weak_typing dealing with strongly and weakly typed languages. For the most part, I think I understand the difference. However, I would like a straight to the point answer differentiating the two.
From my understanding, in weakly typed languages, data types do not have to be explicitly called. This would be a language like Matlab where you can add 4 and 2.3 without having to typecast. Strongly typed languages require the programmer to declare a data type for each variable and/or value. For instance in C, you would need to do something like 4 + (int) 2.3 or (float)4 + 2.3 (can't remember if that is valid C type-casting).
Any information expanding or correcting my understanding of these concepts would be greatly appreciated.
The difference is not about declaring types on variables. It's a bit more subtle than that (and pace Eric Lippert, i think the term is reasonably well-defined). The distinction is that in a strongly-typed language, every expression has a type which can be determined at compile time, and only operations appropriate to that type are allowed.
In an untyped ("weakly typed" to critics, "dynamically typed" to fans) language, that is not the case. The language allows any operation to be performed on any type, with the rather substantial proviso that the operation may fail. That is, while the language may allow the operation, the runtime may not.
Note that it's possible to have a strongly-typed language without requiring type declarations everywhere. Indeed, no strongly-typed language does. Consider this bit of Java:
String s = "hellO";
int l = s.getBytes().length;
How does the compiler decide that .length is legal there? It's legal because it's being used on a byte[]. But there is no declaration of anything as being a byte[] here. Rather, the compiler knows that s is a String, and that when you call getBytes() on a String, you get a byte[]. It infers from those facts that the type of s.getBytes() is a byte[], and so that it is legal to ask for its length.
Some languages whose type systems are more sophisticated than Java's allow the compiler to infer more than this. For example, in Scala, you can say:
val s = "hello"
val l = s.getBytes().length
And the compiler will infer the types of s and l, as well as of the intermediate expressions.
Languages which have strong typing but artificial limits on type inference which require redundant type declarations (like Java) are described as having manifest typing, because the types must be made manifest, which is a fancy way of saying explicitly brought into existence, which is a fancy way of saying written down.
Check Eric Lippert's blog out. There's an entry about just what you're looking for here.
From the looks of his blog, those terms are subjective, so "speak more precisely about type system features."
As you said...
...in weakly typed languages, data types do not have to be explicitly called.
Strongly typed languages require the programmer to declare a data type for each variable and/or value.
This is correct...
There is also a sort of paradigm in so called "strongly" typed languages like c# in which types can be declared if necessary or wanted by the programmer... e.g. C# has the "var" type, but also has strong types (Int32, String, Boolean, etc) which many programmers that use this language prefer.
In this way a language can be both "strongly" and "weakly" typed.
I hope this helps further your understanding of this concept...
The Wikipedia article on ANSI C says:
One of the aims of the ANSI C standardization process was to produce a superset of K&R C (the first published standard), incorporating many of the unofficial features subsequently introduced. However, the standards committee also included several new features, such as function prototypes (borrowed from the C++ programming language), and a more capable preprocessor. The syntax for parameter declarations was also changed to reflect the C++ style.
That makes me think that there are differences. However, I didn't see a comparison between K&R C and ANSI C. Is there such a document? If not, what are the major differences?
EDIT: I believe the K&R book says "ANSI C" on the cover. At least I believe the version that I have at home does. So perhaps there isn't a difference anymore?
There may be some confusion here about what "K&R C" is. The term refers to the language as documented in the first edition of "The C Programming Language." Roughly speaking: the input language of the Bell Labs C compiler circa 1978.
Kernighan and Ritchie were involved in the ANSI standardization process. The "ANSI C" dialect superceded "K&R C" and subsequent editions of "The C Programming Language" adopt the ANSI conventions. "K&R C" is a "dead language," except to the extent that some compilers still accept legacy code.
Function prototypes were the most obvious change between K&R C and C89, but there were plenty of others. A lot of important work went into standardizing the C library, too. Even though the standard C library was a codification of existing practice, it codified multiple existing practices, which made it more difficult. P.J. Plauger's book, The Standard C Library, is a great reference, and also tells some of the behind-the-scenes details of why the library ended up the way it did.
The ANSI/ISO standard C is very similar to K&R C in most ways. It was intended that most existing C code should build on ANSI compilers without many changes. Crucially, though, in the pre-standard era, the semantics of the language were open to interpretation by each compiler vendor. ANSI C brought in a common description of language semantics which put all the compilers on an equal footing. It's easy to take this for granted now, some 20 years later, but this was a significant achievement.
For the most part, if you don't have a pre-standard C codebase to maintain, you should be glad you don't have to worry about it. If you do--or worse yet, if you're trying to bring an old program up to more modern standards--then you have my sympathies.
There are some minor differences, but I think later editions of K&R are for ANSI C, so there's no real difference anymore.
"C Classic" for lack of a better terms had a slightly different way of defining functions, i.e.
int f( p, q, r )
int p, float q, double r;
{
// Code goes here
}
I believe the other difference was function prototypes. Prototypes didn't have to - in fact they couldn't - take a list of arguments or types. In ANSI C they do.
function prototype.
constant & volatile qualifiers.
wide character support and internationalization.
permit function pointer to be used without dereferencing.
Another difference is that function return types and parameter types did not need to be defined. They would be assumed to be ints.
f(x)
{
return x + 1;
}
and
int f(x)
int x;
{
return x + 1;
}
are identical.
The major differences between ANSI C and K&R C are as follows:
function prototyping
support of the const and volatile data type qualifiers
support wide characters and internationalization
permit function pointers to be used without dereferencing
ANSI C adopts c++ function prototype technique where function definition and declaration include function names,arguments' data types, and return value data types. Function prototype enable ANSI C compiler to check for function calls in user programs that pass invalid numbers of arguments or incompatible arguments data types. These fix major weakness of the K&R C compiler.
Example: to declares a function foo and requires that foo take two arguments
unsigned long foo (char* fmt, double data)
{
/*body of foo */
}
FUNCTION PROTOTYPING:ANSI C adopts c++ function prototype technique where function definaton and declaration include function names,arguments t,data types and return value data types.function prototype enable ANSI ccompilers to check for function call in user program that passes invalid number number of argument or incompatiblle argument data types.these fix a major weakness of the K&R C compilers:invalid call in user program often passes compilation but cause program to crash when they are executed
The difference is:
Prototype
wide character support and internationalisation
Support for const and volatile keywords
permit function pointers to be used as dereferencing
A major difference nobody has yet mentioned is that before ANSI, C was defined largely by precedent rather than specification; in cases where certain operations would have predictable consequences on some platforms but not others (e.g. using relational operators on two unrelated pointers), precedent strongly favored making platform guarantees available to the programmer. For example:
On platforms which define a natural ranking among all pointers to all objects, application of the relational operators to arbitrary pointers could be relied upon to yield that ranking.
On platforms where the natural means of testing whether one pointer is "greater than" another never has any side-effect other than yielding a true or false value, application of the relational operators to arbitrary pointers could likewise be relied upon never to have any side-effects other than yielding a true or false value.
On platforms where two or more integer types shared the same size and representation, a pointer to any such integer type could be relied upon to read or write information of any other type with the same representation.
On two's-complement platforms where integer overflows naturally wrap silently, an operation involving an unsigned values smaller than "int" could be relied upon to behave as though the value was unsigned in cases where the result would be between INT_MAX+1u and UINT_MAX and it was not promoted to a larger type, nor used as the left operand of >>, nor either operand of /, %, or any comparison operator. Incidentally, the rationale for the Standard gives this as one of the reasons small unsigned types promote to signed.
Prior to C89, it was unclear to what lengths compilers for platforms where the above assumptions wouldn't naturally hold might be expected to go to uphold those assumptions anyway, but there was little doubt that compilers for platforms which could easily and cheaply uphold such assumptions should do so. The authors of the C89 Standard didn't bother to expressly say that because:
Compilers whose writers weren't being deliberately obtuse would continue doing such things when practical without having to be told (the rationale given for promoting small unsigned values to signed strongly reinforces this view).
The Standard only required implementations to be capable of running one possibly-contrived program without a stack overflow, and recognized that while an obtuse implementation could treat any other program as invoking Undefined Behavior but didn't think it was worth worrying about obtuse compiler writers writing implementations that were "conforming" but useless.
Although "C89" was interpreted contemporaneously as meaning "the language defined by C89, plus whatever additional features and guarantees the platform provides", the authors of gcc have been pushing an interpretation which excludes any features and guarantees beyond those mandated by C89.
The biggest single difference, I think, is function prototyping and the syntax for describing the types of function arguments.
Despite all the claims to the contary K&R was and is quite capable of providing any sort of stuff from low down close to the hardware on up.
The problem now is to find a compiler (preferably free) that can give a clean compile on a couple of millions of lines of K&R C without out having to mess with it.And running on something like a AMD multi core processor.
As far as I can see, having looked at the source of the GCC 4.x.x series there is no simple hack to reactivate the -traditional and -cpp-traditional lag functionality to their previous working state without without more effor than I am prepered to put in. And simpler to build a K&R pre-ansi compiler from scratch.