Oberon: How to resolve contradiction in Wirth's PIO re type guard - oberon

I am trying to figure out whether Oberon allows addressing of a field in a record that is not present in said record's type declaration, but only in one of its extensions and do so without a type guard.
In PIO ("Programming in Oberon") page 62, last sentence of the first paragaph, Wirth writes (1):
This concludes our brief introduction to the object-oriented paradigm
of programming. We realize that almost no language features had to be
added to Oberon to support it. Apart from the already present
facilities of records and of procedural types, only the notion of type
extension is both necessary and crucial. It allows to construct
hierarchies of types and to build inhomogeneous data structures. As a
consequence of abandoning the rule of strictly static typing, the
introduction of dynamic type tests became necessary. The further
facility of the type guard is merely one of convenience.
In PIO page 59, first three sentences of the last paragraph before scetion 23.2 he writes (2):
The simple designator p.radius would not be acceptable, because p is of type Figure, which does not feature a field radius. With the type guard, the programmer can ascertain that in this case p is also of type Circle, in which case the field radius is indeed applicable. Whereas p is of base type Figure, p(Circle) is of type Circle.
On the one hand I interpret #2 such that the type guard is absolutely necessary in order to be able to address a field that is not in the designator's type declaration. Were it not for the type guard, addressing such a field should cause a compile time error.
On the other hand, if the type guard is merely a convenience as suggested by #1, then it could also be omitted. Its facility would simply be that of an assert and consequently the compiler could allow the addressing of a field that is not in the designator's type declaration.
Since the latter is not type safe I would be surprised if Wirth intended it that way.
I am therefore inclined to completely disregard #1 and implement #2.
Before I bother Wirth with an email I'd appreciate if Oberon practitioners (and compiler implementers) could share how this is interpreted in their respective Oberon compilers.
thanks in advance

I emailed Professor Wirth to ask for clarification.
It turns out that in the earlier Oberon language reports the statement "merely a convenience" has indeed been misleading because in these versions of Oberon the type guard syntax was necessary to address fields of extensions not present in the base type. There was no other way to do this.
However, as Wirth pointed out, in his latest revision of Oberon the semantics of the CASE statement have been extended to perform both type test and addressing of fields in extensions not present in their base type.
CASE msg OF
DrawMsg : msg.draw(self)
| MoveMsg : msg.move(self, msg.dx, msg.dy)
...
In this case, neither the IS type test, nor the type guard syntax is strictly necessary. Thus, in the current Oberon version they are indeed merely convenience.
The language report for the latest Oberon version can be found at:
https://www.inf.ethz.ch/personal/wirth/Oberon/Oberon07.Report.pdf
The CASE statement is described in section 9.5.

Related

Rule 2.3 MISRA A project should not contain unused type declarations

Whats means for "project"?
And in follow statement
"If a type is declared but not used, then it is unclear to a reviewer if the type is redundant or it has been left unused by mistake."
Whats mean "if type is redundant"? What is a redundant type?
MISRA document does not contain a strict definition of the "project". Intuitively, a project can be defined as a collection of source files used to build a set of artifacts.
Redundant type in this context means a type definition that is not used in the project sources. They can be easily detected using -Wunused-local-typedefs option in the recent versions of gcc and clang.
This is a family of rules from under MISRA-C:2012 2.x that in plain English say that you should never declare any variables, types, macros etc that aren't actually used anywhere in the program. Which is common sense - redundant simply means not used anywhere.
But note that these rules are mainly there for the benefit of the static analyser - this is the kind of checks that you definitely wish to automate. For mission-critical systems in general, we aren't allow to have parts of the production code which are never actually executed. Not even code which is "commented out" is allowed.

Valid programs in C89, but not in C99

Are there features / semantics introduced, or removed, in C99 which would make a well defined program written in C89 either
invalid (i.e not compiling anymore, according to the C99 standard)
compiling, but having different semantics.
My findings so far, concerning plainly invalid programs:
implicit int (C89 §3.5.2)
implicit function declaration (C89 §3.3.2.2)
not returning from a function expecting a return value (C89 §3.6.6.4)
using new keywords as identifier (for example restrict, inline, etc)
hacks involving //, which are now treated as comments. However, nearly never encountered in production code.
Subtle changes, making the same code having different semantics:
Integer division has been made well defined, for example -3 / 2 now has to truncate towards zero (C99 §6.5.5/6), instead of being implementation defined (C89 §3.3.5/6)
strtod gained the ability to parse hexadecimal numbers in C99, by parsing 0x or 0X
What have I missed?
There are a lot of programs which would have been considered valid under C89, prior to the publication of C99, which some people insist were never valid. C89 includes a rule that requires that an object of any type may only be accessed using a pointer of that type, a related type, or a character type. Prior to the publication of C99, this rule was generally interpreted as applying only to "named" objects (variables of static or automatic duration which are accessed directly by name), and only in situations where the object in question didn't have its address taken immediately before it was used as a different pointer type. Such interpretation was motivated by a number of factors:
One of the stated goals of the Standard was to fit with what existing compilers and programs were doing, and while it would have been rare for existing programs to access discrete named variables using pointers of different types other than in cases where the variable's address was taken immediately before such use, many other usages of pointer type punning were quite common.
The rationale for the Standard includes as its sole example a function which receives a pointer of one primitive type to write a global variable of another primitive type in such a way that a compiler would have no particular reason to expect aliasing. Being able to keep global variables in registers is clearly a useful optimization, and the stated purpose of the rule is to allow such optimizations in cases where a compiler would have no reason to expect aliasing to occur. Outlawing constructs like like (int*)&foo=23; does nothing to aid such optimizations, since the fact that code is taking foo's address and dereferencing it should make it abundantly clear to any compiler that isn't being deliberately obtuse that the code is going to modify foo.
There are many kinds of code which require semantically the ability to use memory bits as various types, and nothing in the Standard indicate that the rules were intended to make programmers jump through hoops (e.g. by using memcpy) to achieve semantics that could have been easily obtained in the absence of the rules, especially considering that using memcpy would prevent the compiler from keeping global variables in registers across the pointer accesses (thus defeating the purpose for which the rules were written in the first place).
If structure types V and W have a common initial sequence, U is any union type containing both, and p is a V* which identifies the V within a U, then (W*)(U*)p may be used to access those common members, and will be equivalent to (W*)p. Unless a compiler could show that p couldn't possibly be a pointer to a member of some union containing W, it would be required to allow (W*)p to access the common members; it was more helpful to simply treat such common member access as being legitimate regardless of whether or where U might exist than to search for excuses to deny it.
Nothing in the C89 rules makes clear how the "type" of a region of allocated storage is defined, or how storage which holds things of one type that are no longer needed might be re-purposed to hold things of another.
Keeping track of registers allocated to named variables was easier than keeping track of registers allocated to other pointer exceptions, and code which was interested in minimizing the number of loads and stores via pointers would often copy things to named variables and work on them there.
C99 added "effective type" rules which are explicitly applicable to allocated storage. Some people insist those were merely "clarifications" of rules which already existed in C89, but for the above reasons I find that viewpoint untenable. It's fashionable to claim that the only reasons compilers didn't apply aliasing rules to unnamed objects are #5 and #6, but objections #1-#4 are equally significant (and continue to apply to C99 just as much as C89). Still, since C99 added the effective type rules, many constructs which would have been treated as legitimate by most common interpretations of the C89 rules are clearly forbidden.
As an element of contrast and comparison, the git/git codebase remains strictly conform to C89 and does not use C99 initializers, or features from newer C standard.
This is detailed in Git 2.23 (Q3 2019) in Git Coding Guidelines.
This answer illustrates post-C89 feature that might be compatible with C89.
See commit cc0c429 (16 Jul 2019) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit fe9dc6b, 25 Jul 2019)
CodingGuidelines: spell out post-C89 rules
Even though we have been sticking to C89, there are a few handy features we borrow from more recent C language in our codebase after trying them in weather balloons and saw that nobody screamed.
Spell them out.
While at it, extend the existing variable declaration rule a bit to
read better with the newly spelled out rule for the for loop.
The coding guidelines now include:
You should not use features from newer C standard, even if your compiler groks them.
There are a few exceptions to this guideline:
since early 2012 with e1327023ea (Git v1.7.9.2), we have been using an enum definition whose last element is followed by a comma.
This, like an array initializer that ends with a trailing comma, can be used to reduce the patch noise when adding a new identifer at the end.
since mid 2017 with cbc0f81d (Git v2.15.0-rc0), we have been using designated
initializers for struct (e.g. "struct t v = { .val = 'a' };")
There are certain C99 features that might be nice to use in our code base, but we've hesitated to do so in order to avoid breaking compatibility with older compilers.
But we don't actually know if people are even using pre-C99 compilers these days.
If this patch can survive a few releases without complaint, then we can feel more confident that designated initializers are widely supported by our user base.
It also is an indication that other C99 features may be supported, but not a guarantee (e.g., gcc had designated initializers before C99 existed).
since mid 2017 with 512f41cf (Git v2.15.0-rc0), we have been using designated initializers for array (e.g. "int array[10] = { [5] = 2 }").
This is another test balloon to see if we get complaints from people
whose compilers do not support designated initializer for arrays.
These used to be forbidden, but we have not heard any breakage report, and they are assumed to be safe.
Variables have to be declared at the beginning of the block, before the first statement (i.e. -Wdeclaration-after-statement).
Declaring a variable in the for loop "for (int i = 0; i < 10; i++)" is still not allowed in this codebase.

Difference between strongly and weakly typed languages?

I have read several pages, including the wiki page http://en.wikipedia.org/wiki/Strong_and_weak_typing dealing with strongly and weakly typed languages. For the most part, I think I understand the difference. However, I would like a straight to the point answer differentiating the two.
From my understanding, in weakly typed languages, data types do not have to be explicitly called. This would be a language like Matlab where you can add 4 and 2.3 without having to typecast. Strongly typed languages require the programmer to declare a data type for each variable and/or value. For instance in C, you would need to do something like 4 + (int) 2.3 or (float)4 + 2.3 (can't remember if that is valid C type-casting).
Any information expanding or correcting my understanding of these concepts would be greatly appreciated.
The difference is not about declaring types on variables. It's a bit more subtle than that (and pace Eric Lippert, i think the term is reasonably well-defined). The distinction is that in a strongly-typed language, every expression has a type which can be determined at compile time, and only operations appropriate to that type are allowed.
In an untyped ("weakly typed" to critics, "dynamically typed" to fans) language, that is not the case. The language allows any operation to be performed on any type, with the rather substantial proviso that the operation may fail. That is, while the language may allow the operation, the runtime may not.
Note that it's possible to have a strongly-typed language without requiring type declarations everywhere. Indeed, no strongly-typed language does. Consider this bit of Java:
String s = "hellO";
int l = s.getBytes().length;
How does the compiler decide that .length is legal there? It's legal because it's being used on a byte[]. But there is no declaration of anything as being a byte[] here. Rather, the compiler knows that s is a String, and that when you call getBytes() on a String, you get a byte[]. It infers from those facts that the type of s.getBytes() is a byte[], and so that it is legal to ask for its length.
Some languages whose type systems are more sophisticated than Java's allow the compiler to infer more than this. For example, in Scala, you can say:
val s = "hello"
val l = s.getBytes().length
And the compiler will infer the types of s and l, as well as of the intermediate expressions.
Languages which have strong typing but artificial limits on type inference which require redundant type declarations (like Java) are described as having manifest typing, because the types must be made manifest, which is a fancy way of saying explicitly brought into existence, which is a fancy way of saying written down.
Check Eric Lippert's blog out. There's an entry about just what you're looking for here.
From the looks of his blog, those terms are subjective, so "speak more precisely about type system features."
As you said...
...in weakly typed languages, data types do not have to be explicitly called.
Strongly typed languages require the programmer to declare a data type for each variable and/or value.
This is correct...
There is also a sort of paradigm in so called "strongly" typed languages like c# in which types can be declared if necessary or wanted by the programmer... e.g. C# has the "var" type, but also has strong types (Int32, String, Boolean, etc) which many programmers that use this language prefer.
In this way a language can be both "strongly" and "weakly" typed.
I hope this helps further your understanding of this concept...

Why does/did C allow implicit function and typeless variable declarations?

Why is it sensible for a language to allow implicit declarations of functions and typeless variables? I get that C is old, but allowing to omit declarations and default to int() (or int in case of variables) doesn't seem so sane to me, even back then.
So, why was it originally introduced? Was it ever really useful? Is it actually (still) used?
Note: I realise that modern compilers give you warnings (depending on which flags you pass them), and you can suppress this feature. That's not the question!
Example:
int main() {
static bar = 7; // defaults to "int bar"
return foo(bar); // defaults to a "int foo()"
}
int foo(int i) {
return i;
}
See Dennis Ritchie's "The Development of the C Language": http://web.archive.org/web/20080902003601/http://cm.bell-labs.com/who/dmr/chist.html
For instance,
In contrast to the pervasive syntax variation that occurred during the
creation of B, the core semantic content of BCPL—its type structure
and expression evaluation rules—remained intact. Both languages are
typeless, or rather have a single data type, the 'word', or 'cell', a
fixed-length bit pattern. Memory in these languages consists of a
linear array of such cells, and the meaning of the contents of a cell
depends on the operation applied. The + operator, for example, simply
adds its operands using the machine's integer add instruction, and the
other arithmetic operations are equally unconscious of the actual
meaning of their operands. Because memory is a linear array, it is
possible to interpret the value in a cell as an index in this array,
and BCPL supplies an operator for this purpose. In the original
language it was spelled rv, and later !, while B uses the unary *.
Thus, if p is a cell containing the index of (or address of, or
pointer to) another cell, *p refers to the contents of the pointed-to
cell, either as a value in an expression or as the target of an
assignment.
This typelessness persisted in C until the authors started porting it to machines with different word lengths:
The language changes during this period, especially around 1977, were largely focused on considerations of portability and type safety,
in an effort to cope with the problems we foresaw and observed in
moving a considerable body of code to the new Interdata platform. C at
that time still manifested strong signs of its typeless origins.
Pointers, for example, were barely distinguished from integral memory
indices in early language manuals or extant code; the similarity of
the arithmetic properties of character pointers and unsigned integers
made it hard to resist the temptation to identify them. The unsigned
types were added to make unsigned arithmetic available without
confusing it with pointer manipulation. Similarly, the early language
condoned assignments between integers and pointers, but this practice
began to be discouraged; a notation for type conversions (called
`casts' from the example of Algol 68) was invented to specify type
conversions more explicitly. Beguiled by the example of PL/I, early C
did not tie structure pointers firmly to the structures they pointed
to, and permitted programmers to write pointer->member almost without
regard to the type of pointer; such an expression was taken
uncritically as a reference to a region of memory designated by the
pointer, while the member name specified only an offset and a type.
Programming languages evolve as programming practices change. In modern C and the modern programming environment, where many programmers have never written assembly language, the notion that ints and pointers are interchangeable may seem nearly unfathomable and unjustifiable.
It's the usual story — hysterical raisins (aka 'historical reasons').
In the beginning, the big computers that C ran on (DEC PDP-11) had 64 KiB for data and code (later 64 KiB for each). There was a limit to how complex you could make the compiler and still have it run. Indeed, there was scepticism that you could write an O/S using a high-level language such as C, rather than needing to use assembler. So, there were size constraints. Also, we are talking a long time ago, in the early to mid 1970s. Computing in general was not as mature a discipline as it is now (and compilers specifically were much less well understood). Also, the languages from which C was derived (B and BCPL) were typeless. All these were factors.
The language has evolved since then (thank goodness). As has been extensively noted in comments and down-voted answers, in strict C99, implicit int for variables and implicit function declarations have both been made obsolete. However, most compilers still recognize the old syntax and permit its use, with more or less warnings, to retain backwards compatibility, so that old source code continues to compile and run as it always did. C89 largely standardized the language as it was, warts (gets()) and all. This was necessary to make the C89 standard acceptable.
There is still old code around using the old notations — I spend quite a lot of time working on an ancient code base (circa 1982 for the oldest parts) which still hasn't been fully converted to prototypes everywhere (and that annoys me intensely, but there's only so much one person can do on a code base with multiple millions of lines of code). Very little of it still has 'implicit int' for variables; there are too many places where functions are not declared before use, and a few places where the return type of a function is still implicitly int. If you don't have to work with such messes, be grateful to those who have gone before you.
Probably the best explanation for "why" comes from here:
Two ideas are most characteristic of C among languages of its class: the relationship between arrays and pointers, and the way in which declaration syntax mimics expression syntax. They are also among its most frequently criticized features, and often serve as stumbling blocks to the beginner. In both cases, historical accidents or mistakes have exacerbated their difficulty. The most important of these has been the tolerance of C compilers to errors in type. As should be clear from the history above, C evolved from typeless languages. It did not suddenly appear to its earliest users and developers as an entirely new language with its own rules; instead we continually had to adapt existing programs as the language developed, and make allowance for an existing body of code. (Later, the ANSI X3J11 committee standardizing C would face the same problem.)
Systems programming languages don't necessarily need types; you're mucking around with bytes and words, not floats and ints and structs and strings. The type system was grafted onto it in bits and pieces, rather than being part of the language from the very beginning. As C has moved from being primarily a systems programming language to a general-purpose programming language, it has become more rigorous in how it handles types. But, even though paradigms come and go, legacy code is forever. There's still a lot of code out there that relies on that implicit int, and the standards committee is reluctant to break anything that's working. That's why it took almost 30 years to get rid of it.
A long, long time ago, back in the K&R, pre-ANSI days, functions looked quite different than they do today.
add_numbers(x, y)
{
return x + y;
}
int ansi_add_numbers(int x, int y); // modern, ANSI C
When you call a function like add_numbers, there is an important difference in the calling conventions: all types are "promoted" when the function is called. So if you do this:
// no prototype for add_numbers
short x = 3;
short y = 5;
short z = add_numbers(x, y);
What happens is x is promoted to int, y is promoted to int, and the return type is assumed to be int by default. Likewise, if you pass a float it is promoted to double. These rules ensured that prototypes weren't necessary, as long as you got the right return type, and as long as you passed the right number and type of arguments.
Note that the syntax for prototypes is different:
// K&R style function
// number of parameters is UNKNOWN, but fixed
// return type is known (int is default)
add_numbers();
// ANSI style function
// number of parameters is known, types are fixed
// return type is known
int ansi_add_numbers(int x, int y);
A common practice back in the old days was to avoid header files for the most part, and just stick the prototypes directly in your code:
void *malloc();
char *buf = malloc(1024);
if (!buf) abort();
Header files are accepted as a necessary evil in C these days, but just as modern C derivatives (Java, C#, etc.) have gotten rid of header files, old-timers didn't really like using header files either.
Type safety
From what I understand about the old old days of pre-C, there wasn't always much of a static typing system. Everything was an int, including pointers. In this old language, the only point of function prototypes would be to catch arity errors.
So if we hypothesize that functions were added to the language first, and then a static type system was added later, this theory explains why prototypes are optional. This theory also explains why arrays decay to pointers when used as function arguments -- since in this proto-C, arrays were nothing more than pointers which get automatically initialized to point to some space on the stack. For example, something like the following may have been possible:
function()
{
auto x[7];
x += 1;
}
Citations
The Development of the C Language, Dennis M. Ritchie
On typelessness:
Both languages [B and BCPL] are typeless, or rather have a single data type, the 'word,' or 'cell,' a fixed-length bit pattern.
On the equivalence of integers and pointers:
Thus, if p is a cell containing the index of (or address of, or pointer to) another cell, *p refers to the contents of the pointed-to cell, either as a value in an expression or as the target of an assignment.
Evidence for the theory that prototypes were omitted due to size constraints:
During development, he continually struggled against memory limitations: each language addition inflated the compiler so it could barely fit, but each rewrite taking advantage of the feature reduced its size.
Some food for thought. (It's not an answer; we actually know the answer — it's permitted for backward compatibility.)
And people should look at COBOL code base or f66 libraries before saying why it's not cleaned up in 30 years or so!
gcc with its default does not spit out any warnings.
With -Wall and gcc -std=c99 do spit out the correct thing
main.c:2: warning: type defaults to ‘int’ in declaration of ‘bar’
main.c:3: warning: implicit declaration of function ‘foo’
The lint functionality built into modern gcc is showing its color.
Interestingly the modern clone of lint, the secure lint — I mean splint — gives only one warning by default.
main.c:3:10: Unrecognized identifier: foo
Identifier used in code has not been declared. (Use -unrecog to inhibit
warning)
The llvm C compiler clang which also has a static analyser built into it like gcc, spits out the two warnings by default.
main.c:2:10: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
static bar = 7; // defaults to "int bar"
~~~~~~ ^
main.c:3:10: warning: implicit declaration of function 'foo' is invalid in C99
[-Wimplicit-function-declaration]
return foo(bar); // defaults to a "int foo()"
^
People used to think we don't need backward compatibility for 80's stuff. All the code must be cleaned up or replaced. But it turns out it's not the case. A lot of production code stays in prehistoric non-standard times.
EDIT:
I didn't look through other answers before posting mine. I may have misunderstood the intention of the poster. But the thing is there was a time when you hand compiled your code, and use toggle to put the binary pattern in memory. They didn't need a "type system". Nor does the PDP machine in front of which Richie and Thompson posed like this :
Don't look at the beard, look at the "toggles", which I heard were used to bootstrap the machine.
And also look how they used to boot UNIX in this paper. It's from the Unix 7th edition manual.
http://wolfram.schneider.org/bsd/7thEdManVol2/setup/setup.html
The point of the matter is they didn't need so much software layer managing a machine with KB sized memory. Knuth's MIX has 4000 words. You don't need all these types to program a MIX computer. You can happily compare a integer with pointer in a machine like this.
I thought why they did this is quite self-evident. So I focused on how much is left to be cleaned up.

Get type name as string in C, GCC

Is there some 'builtin' extension in GCC to get type name of expression in C? (As a string, i.e. 'const char*').
First. You want to obtain type of a C expression at runtime. The problem is that types are erased during compilation and the machine code is almost typeless, it does not contains anything else than 8/16/32/64 bit integers and 32/64/80 bit floating point numbers (in case of x86). Types are compile time entity for C (C++ may retain some information about types in runtime though, because of its object-oriented nature, it associates types with classes, but it's hard to track PODs and primitive types).
Second. You want a type of a C expression. Sometimes it's hard to say what a given C expression be at runtime.
Thus there's no way to obtain C expression type at runtime.
Maybe you could have a look to the TYPE_NAME macro which seems to be a good starting point.
Since you said you want the name at runtime, that is a definitive "no". In C, data is just bytes in memory and doesn't have an intrinsic type at all. It is only the type declaration that tells the compiler what the compiled code should expect the type to be.
It would make sense, however, for a C compiler to be able to recognize the type of a variable at compile time, and that would be great for implementing things like equality assertions with friendly output in a unit testing framework. I can't see that C has anything like that either though.
Does anyone know if new versions of the ANSI C spec are still being developed? Compile-time type identification would be a great thing to add. Maybe integer constants for intrinsic types and a type equality test for either intrinsic or defined types?

Resources