Was there a very early C "standard" where the following was legal for the definition of a two-dimensional array?
int array[const_x, const_y];
int array2[2, 10];
I've stumbled upon some old legacy code which uses this (and only this) notation for multi-dimensional arrays. The code is, except for this oddity, perfectly valid C (and surprisingly well-designed for the time).
As I didn't find any macros which convert between [,] and [][], and I assume it's not a form of practical joke, it seems that once upon a time there hath been thy olde C compiler which accepted this notation. Or did I miss something?
Edit: If it helps, it's for embedded microcontrollers (atmel). From experience I can tell, that embedded compilers are not that well-known for standard-compliance.
The code on current compilers works as intended (as far as it can be guessed from the function names, descriptions and variables) if I change all [,] to [][].
The first formal standard was ANSI X3.159-1989, and the first informal standard would generally be agreed to be the first edition of Kernighan & Ritchie. Neither of these allowed the comma to be used to declare a two-dimensional array.
It appears to be an idiosyncracy of your particular compiler (one that renders it non-standard-conforming, since it would change the semantics of some conforming programs).
Take a look at this forum post.
the comma operator evaluates the left hand side, discards
the result, then evaluates the right hand side. Thus "2,5" is the same
as "5", and "5,2" is the same as "2".
This could be what is happening, although the why of it is beyond me.
Note that comma cannot be used in indexing multidimensional array: the code A[i, j] evaluates to A[j] with the i discarded, instead of the correct A[i][j]. This differs from the syntax in Pascal, where A[i, j] is correct, and can be a source of errors.
From Wikipedia
Related
K&R says:
The unary + is new with the ANSI standard. It was added for symmetry with the unary -.
What is this symmetry in regards to (i.e. it's clearly not actual, geometric symmetry) and what is the significance for? Is it important in programming?
It's just something that means "in the pattern of", or "as you would reasonably expect given the established pattern". In this particular case - is the counterpart to +, though there are other pairs like this throughout C.
It means you can do +2 as well as -2 and both work. It would be odd, or asymmetric, if +2 was somehow a syntax error. In fact, in K&R C there are a lot of odd things that were later ironed out in the standardization process. This appears to have been one of them.
You don't really need a unary + operator, you can just omit the + and the code compiles fine, but by the same logic you don't need a unary - either, you can always do 0 - 5 instead of -5, though an oversight like this would seem ridiculous.
I'm not trying to replicate the usual question about C not being able to return arrays but to dig a bit more deeply into it.
We cannot do this:
char f(void)[8] {
char ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
char obj_a[10];
obj_a = f();
}
But we can do:
struct s { char arr[10]; };
struct s f(void) {
struct s ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
struct s obj_a;
obj_a = f();
}
So, I was skimming the ASM code generated by gcc -S and seems to be working with the stack, addressing -x(%rbp) as with any other C function return.
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
Extra data: I am using Linux and gcc on a x64 Intel.
First of all, yes, you can encapsulate an array in a structure, and then do anything you want with that structure (assign it, return it from a function, etc.).
Second of all, as you've discovered, the compiler has little difficulty emitting code to return (or assign) structures. So that's not the reason you can't return arrays, either.
The fundamental reason you cannot do this is that, bluntly stated, arrays are second-class data structures in C. All other data structures are first-class. What are the definitions of "first-class" and "second-class" in this sense? Simply that second-class types cannot be assigned.
(Your next question might be, "Other than arrays, are there any other second-class data types?", and I think the answer is "Not really, unless you count functions".)
Intimately tied up with the fact that you can't return (or assign) arrays is that there are no values of array type, either. There are objects (variables) of array type, but whenever you try to take the value of one, you immediately get a pointer to the array's first element. [Footnote: more formally, there are no rvalues of array type, although an object of array type can be thought of as an lvalue, albeit a non-assignable one.]
So, quite aside from the fact that you can't assign to an array, you can't even generate a value to try to assign. If you say
char a[10], b[10];
a = b;
it's as if you had written
a = &b[0];
So we've got an array on the left, but a pointer on the right, and we'd have a massive type mismatch even if arrays somehow were assignable. Similarly (from your example) if we try to write
a = f();
and somewhere inside the definition of function f() we have
char ret[10];
/* ... fill ... */
return ret;
it's as if that last line said
return &ret[0];
and, again, we have no array value to return and assign to a, merely a pointer.
(In the function call example, we've also got the very significant issue that ret is a local array, perilous to try to return in C. More on this point later.)
Now, part of your question is probably "Why is it this way?", and also "If you can't assign arrays, why can you assign structures containing arrays?"
What follows is my interpretation and my opinion, but it's consistent with what Dennis Ritchie describes in his paper The Development of the C Language.
The non-assignability of arrays arises from three facts:
C is intended to be syntactically and semantically close to the machine hardware. An elementary operation in C should compile down to one or a handful of machine instructions taking one or a handful of processor cycles.
Arrays have always been special, especially in the way they relate to pointers; this special relationship evolved from and was heavily influenced by the treatment of arrays in C's predecessor language B.
Structures weren't initially in C.
Due to point 2, it's impossible to assign arrays, and due to point 1, it shouldn't be possible anyway, because a single assignment operator = shouldn't expand to code that might take N thousand cycles to copy an N thousand element array.
And then we get to point 3, which really ends up leading to a contradiction.
When C got structures, they initially weren't fully first-class either, in that you couldn't assign or return them. But the reason you couldn't was simply that the first compiler wasn't smart enough, at first, to generate the code. There was no syntactic or semantic roadblock, as there was for arrays.
And the goal all along was for structures to be first-class, and this was achieved relatively early on. The compiler caught up, and learned how to emit code to assign and return structures, shortly around the time that the first edition of K&R was going to print.
But the question remains, if an elementary operation is supposed to compile down to a small number of instructions and cycles, why doesn't that argument disallow structure assignment? And the answer is, yes, it's a contradiction.
I believe (though this is more speculation on my part) that the thinking was something like this: "First-class types are good, second-class types are unfortunate. We're stuck with second-class status for arrays, but we can do better with structs. The no-expensive-code rule isn't really a rule, it's more of a guideline. Arrays will often be large, but structs will usually be small, tens or hundreds of bytes, so assigning them won't usually be too expensive."
So a consistent application of the no-expensive-code rule fell by the wayside. C has never been perfectly regular or consistent, anyway. (Nor, for that matter, are the vast majority of successful languages, human as well as artificial.)
With all of this said, it may be worth asking, "What if C did support assigning and returning arrays? How might that work?" And the answer will have to involve some way of turning off the default behavior of arrays in expressions, namely that they tend to turn into pointers to their first element.
Sometime back in the '90's, IIRC, there was a fairly well-thought-out proposal to do exactly this. I think it involved enclosing an array expression in [ ] or [[ ]] or something. Today I can't seem to find any mention of that proposal (though I'd be grateful if someone can provide a reference). At any rate, I believe we could extend C to allow array assignment by taking the following three steps:
Remove the prohibition of using an array on the left-hand side of an assignment operator.
Remove the prohibition of declaring array-valued functions. Going back to the original question, make char f(void)[8] { ... } legal.
(This is the biggie.) Have a way of mentioning an array in an expression and ending up with a true, assignable value (an rvalue) of array type. For the sake of argument I'll posit a new operator or pseudofunction called arrayval( ... ).
[Side note: Today we have a "key definition" of array/pointer correspondence, namely that:
A reference to an object of array type which appears in an expression decays (with three exceptions) into a pointer to its first element.
The three exceptions are when the array is the operand of a sizeof operator, or a & operator, or is a string literal initializer for a character array. Under the hypothetical modifications I'm discussing here, there would be a fourth exception, namely when the array was an operand of this new arrayval operator.]
Anyway, with these modifications in place, we could write things like
char a[8], b[8] = "Hello";
a = arrayval(b);
(Obviously we would also have to decide what to do if a and b were not the same size.)
Given the function prototype
char f(void)[8];
we could also do
a = f();
Let's look at f's hypothetical definition. We might have something like
char f(void)[8] {
char ret[8];
/* ... fill ... */
return arrayval(ret);
}
Note that (with the exception of the hypothetical new arrayval() operator) this is just about what Dario Rodriguez originally posted. Also note that — in the hypothetical world where array assignment was legal, and something like arrayval() existed — this would actually work! In particular, it would not suffer the problem of returning a soon-to-be-invalid pointer to the local array ret. It would return a copy of the array, so there would be no problem at all — it would be just about perfectly analogous to the obviously-legal
int g(void) {
int ret;
/* ... compute ... */
return ret;
}
Finally, returning to the side question of "Are there any other second-class types?", I think it's more than a coincidence that functions, like arrays, automatically have their address taken when they are not being used as themselves (that is, as functions or arrays), and that there are similarly no rvalues of function type. But this is mostly an idle musing, because I don't think I've ever heard functions referred to as "second-class" types in C. (Perhaps they have, and I've forgotten.)
Footnote: Because the compiler is willing to assign structures, and typically knows how to emit efficient code for doing so, it used to be a somewhat popular trick to co-opt the compiler's struct-copying machinery in order to copy arbitrary bytes from point a to point b. In particular, you could write this somewhat strange-looking macro:
#define MEMCPY(b, a, n) (*(struct foo { char x[n]; } *)(b) = \
*(struct foo *)(a))
that behaved more or less exactly like an optimized in-line version of memcpy(). (And in fact, this trick still compiles and works under modern compilers today.)
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
It has nothing to do with capability per se. Other languages do provide the ability to return arrays, and you already know that in C you can return a struct with an array member. On the other hand, yet other languages have the same limitation that C does, and even more so. Java, for instance, cannot return arrays, nor indeed objects of any type, from methods. It can return only primitives and references to objects.
No, it is simply a question of language design. As with most other things to do with arrays, the design points here revolve around C's provision that expressions of array type are automatically converted to pointers in almost all contexts. The value provided in a return statement is no exception, so C has no way of even expressing the return of an array itself. A different choice could have been made, but it simply wasn't.
For arrays to be first-class objects, you would expect at least to be able to assign them. But that requires knowledge of the size, and the C type system is not powerful enough to attach sizes to any types. C++ could do it, but doesn't due to legacy concerns—it has references to arrays of particular size (typedef char (&some_chars)[32]), but plain arrays are still implicitly converted to pointers as in C. C++ has std::array instead, which is basically the aforementioned array-within-struct plus some syntactic sugar.
Bounty hunting.
The authors of C did not aspire to be language or type system designers. They were tool designers. C was a tool to make system programming easier.
ref: B Kernighan on Pascal Ritchie on C
There was no compelling case for C to do anything unexpected; especially as UNIX and C were ushering in the era of least surprise. Copying arrays, and making complex syntax to do so when it was the metaphorical equivalent of having a setting to burn the toast did not fit the C model.
Everything in C, the language, is effectively constant time, constant size. C, the standard, seems preoccupied with doing away with this core feature which made C so popular; so expect the, uh, standard C/2023.feb07 to feature a punctuation nightmare that enables arrays as r-values.
The decision of the C authors makes eminent sense if you view the programming world pragmatically. If you view it as a pulpit for treasured beliefs, then get onboard for C/2023.feb07 before C/2023.feb08 nullifies it.
I'm afraid in my mind it's not so much a debate of first or second class objects, it's a religious discussion of good practice and applicable practice for deep embedded applications.
Returning a structure either means a root structure being changed by stealth in the depths of the call sequence, or a duplication of data and the passing of large chunks of duplicated data. The main applications of C are still largely concentrated around the deep embedded applications. In these domains you have small processors that don't need to be passing large blocks of data. You also have engineering practice that necessitates the need to be able to operate without dynamic RAM allocation, and with minimal stack and often no heap. It could be argued the return of the structure is the same as modification via pointer, but abstracted in syntax... I'm afraid I'd argue that's not in the C philosophy of "what you see is what you get" in the same way a pointer to a type is.
Personally, I would argue you have found a loop hole, whether standard approved or not. C is designed in such a way that allocation is explicit. You pass as a matter of good practice address bus sized objects, normally in an aspirational one cycle, referring to memory that has been allocated explicitly at a controlled time within the developers ken. This makes sense in terms of code efficiency, cycle efficiency, and offers the most control and clarity of purpose. I'm afraid, in code inspection I'd throw out a function returning a structure as bad practice. C does not enforce many rules, it's a language for professional engineers in many ways as it relies upon the user enforcing their own discipline. Just because you can, doesn't mean you should... It does offer some pretty bullet proof ways to handle data of very complex size and type utilising compile time rigour and minimising the dynamic variations of footprint and at runtime.
First, I know I know. This question has kind of been asked some times before, but most of the answers got on other topics only partly answer my question.
I'm doing something which can parse C like expressions.
That includes expressions for example like (some examples)
1) struct1.struct2.structarray[283].shd->_var
2) *((*array_dptr)[2][1] + 5)
3) struct1.struct2.struct3.var + b * c / 3 % 5
Problem is... I need to be fast on this. The fastest possible, even if it makes the code ugly - well, obviously, the speed improvement must be tangible. The reason is that it is interpreted. It needs to be fast...
I have many questions, and I will probably ask some more depending on your answers. But anyways...
First, I'm aware of "operator priorities". For example algorithms implemented in C compilers will assign to operators a priority number and evaluate the expression based on that.
I've consulted this table : http://en.wikipedia.org/wiki/Operators_in_C_and_C++#Operator_precedence
Now, this is cool but... I wonder a few things.
My principal question is... how would you implement this to be the fastest possible?
I have thought about for example... (please note the program I'm speaking about actually parses a file containing these expressions, and not all C operators will be supported)
1) Stocking the expression string into an array, storing each operator position inside an array, and then starting to parse all this crap, starting from the highest priority operator. For example if I had str = "2*1+3", then after checking all the operators present, I would check for the position at str[1], and the check at right and left, do the operation (here multiply) and then substitude the expression with the result and evaluate again.
The problem I see there is... say two operators in the expr are the same priority
for example : var1 * var2 / var3 / var4
since * and / have both the same precedence, how to know on which position to start the parsing? Of course this example is rather intuitive, but I can the problem growing on enormous expressions.
2) Is this even possible to do it non recursive? Usually recursive means slower due to multiple function call setting their own stack frames, re-initializing stuff etc etc.
3) How to distinguish unary operators from non unaries?
For example : 2 + *a + b * c
There is the dereferencing op and the multiplication one. Intuitively I have an idea on how to do it, but I ain't sure. I'd rather have your advices on this (i think : check if one of the right or left members are operators, if so, then it's unary?)
4) I don't get expressions being evaluated right-to-left. Seems so unnatural to be. More that I don't unterstand what does it means. Would you show an example? Why do it that way?!?
5) Do you have better algorithms in head? Better ideas of achieving it?
For now, that sums pretty much what I'm thinking about.
This ain't an homework by the way. It's practical stuff.
Thanks!
In K&R Section 5.10, in their sample implementation of a grep-like function, there are these lines:
while (--argc > 0 && (*++argv)[0] == '-')
while (c = *++argv[0])
Understanding the syntax there was one of the most challenging things for me, and even now a couple weeks after viewing it for the first time, I still have to think very slowly through the syntax to make sense of it. I compiled the program with this alternate syntax, but I'm not sure that the second line is allowable. I've just never seen *'s and ++'s interleaved like this, but it makes sense to me, it compiles, and it runs. It also requires no parentheses or brackets, which is maybe part of why it seems more clear to me. I just read the operators in one direction only (right to left) rather than bouncing back and forth to either side of the variable name.
while (--argc > 0 && **++argv == '-')
while (c = *++*argv)
Well for one, that's one way to make anyone reading your code to go huh?!?!?!
So, from a readability standpoint, no, you probably shouldn't write code like that.
Nevertheless, it's valid code and breaks down as this:
*(++(*p))
First, p is dereferenced. Then it is incremented. Then it is dereferenced again.
To make thing worse, this line:
while (c = *++*argv)
has an assignment in the loop-condition. So now you have two side-effects to make your reader's head spin. YAY!!!
Seems valid to me. Of course, you should not read it left to right, that's not how C compiler parses the source, and that's not how C language grammatics work. As a rule of thumb, you should first locate the object that's subject to operating upon (in this case - argv), and then analyze the operators, often, like in this case, from inside (the object) to outside. The actual parsing (and reading) rules are of course more complicated.
P. S. And personally, I think this line of code is really not hard to understand (and I'm not a C programming guru), so I don't think you should surround it with parentheses as Mysticial suggests. That would only make the code look big, if you know what I mean...
There's no ambiguity, even without knowledge of the precedence rules.
Both ++ and * are prefix unary operators; they can only apply to an operand that follows them. The second * can only apply to argv, the ++ to *argv, and the first * to ++*argv. So it's equivalent to *(++(*argv)). There's no possible relationship between the precedences of ++ and * that could make it mean anything else.
This is unlike something like *argv++, which could conceivably be either (*argv)++ or *(argv++), and you have to apply precedence rules to determine which (it's *(argv++)` because postfix operators bind more tightly than prefix unary operators).
There's a constraint that ++ can only be applied to an lvalue; since *argv is an lvalue, that's not a problem.
Is this code valid? Yes, but that's not what you asked.
Is this code acceptable? That depends (acceptable to who?).
I wouldn't consider it acceptable - I'd consider it "harder to read than necessary" for a few different reasons.
First; lots of programmers have to work with several different languages, potentially with different operator precedence rules. If your code looks like it relies on a specific language's operator precedence rules (even if it doesn't) then people have to stop and try to remember which rules apply to which language.
Second; different programmers have different skill levels. If you're ever working in a large team of developers you'll find that the best programmers write code that everyone can understand, and the worst programmers write code that contains subtle bugs that half of the team can't spot. Most C programmers should understand "*++*argv", but a good programmer knows that a small number of "not-so-good" programmers either won't understand it or will take a while to figure it out.
Third; out of all the different ways of writing something, you should choose the variation that expresses your intent the best. For this code you're working with an array, and therefore it should look like you intend to be working with an array (and not a pointer). Note: For the same reason, "uint32_t foo = 0x00000002;" is better than "uint32_t foo = 0x02;".
I'm learning C and I'm reading about type equivalence.
I'm curious, does anyone have an opinion why they used structural equivalence for arrays and pointers but they used declaration equivalence for structs and unions?
Why the disparity there? What is the benefit of doing declaration equivalence with structs/unions and structural equivalence for everything else?
I'm sure others will present C specific information, but I'll mention that type-equivalence is one of the classic main problems in programming languages theory. Determining whether two types are actually equivalent is a much trickier problem than it may seem.
For example, here are some overview slides from an academic course just to give you a taste of the headache.
I'm sure it was because it looked like a good idea at the time.
Don't try to overthink the design of C. Kernighan & Ritchie were designing a system implementation language that might be good for other things, and wound up with decisions they regretted later (operator precedence being the best documented). Some of the issues were cleaned up by the Standards Committee, and others were too deeply ingrained.
As Uri points out in his answer, type equivalence is a difficult problem, and is therefore one of those likely to have been punted by K&R in a desire to get a working compiler soon rather than a clean language design later.
In a lot of ways c is a clean expression of assembly idiom from the 1970s. Arrays on these processors are implemented directly using pointers, and c simply copies that fact.
Structural equivalence is difficult to check. Pointers and arrays are pretty simple, but structs and union types are more complex. Testing for structural equivalence is very difficult on these complex types, but easier for pointers and arrays.
EDIT
Originally I had written an answer that dealt with value equivalence instead of type equivalnce, so it wasn't really n answer to this question. I did receive a few upvotes on it, though, so I decided to keep in here.
What I know about [value equivalence], is that pointers and arrays always have a pretty simple layout in memory. This which makes it easy to do a simple byte-for-byte comparison.
For structs and union, this memory layout isn't necessarily this simple. You could for example, have a struct with an int (32 bit) and a double (64 bit). Such a struct requires 128 bits of memory, 32 of which aren't actually relevant for comparison. So, byte-forbyte comparison is out of the question. So, declaration equivalence is just easier to implement.
Don't underrate Dennis Ritchie. Every statically typed language should have a way to create an abstract type which it is impossible for a user to forge. For this you need a type construct or declaration construct that is generative, i.e., every instance of the construct generates a fresh type, distinct from any other. Such a construct is vital if you want to keep other people's mitts off your data. (For plenty of examples, see Dave Hanson's book C Interfaces and Implementations.)
So here values p1 and p2 have different types but the same representation:
struct { float x, y } p1;
struct { float x, y } p2;
Why pick struct to be generative? Because it's general enough to allow a wide range of convenient representations. union is a bit of a stretch, but I suspect it's "collateral design"; in C's type system, union behaves as much list struct as possible, which simplifies the compiler.
Incidentally, "declaration equivalence" is a term I have never heard before. 25 years ago terms like "name equivalence", "structural equivalence", and "occurence equivalence" were popular. Today type systems are much more formal, and equivalence is typically defined by logical rules rather than informal English. When it is helpful to resort to informal English, I usually find that the idea of "generativity" has more explanatory power than inventing a new name for the equivalence rules of each new language.