K&R says:
The unary + is new with the ANSI standard. It was added for symmetry with the unary -.
What is this symmetry in regards to (i.e. it's clearly not actual, geometric symmetry) and what is the significance for? Is it important in programming?
It's just something that means "in the pattern of", or "as you would reasonably expect given the established pattern". In this particular case - is the counterpart to +, though there are other pairs like this throughout C.
It means you can do +2 as well as -2 and both work. It would be odd, or asymmetric, if +2 was somehow a syntax error. In fact, in K&R C there are a lot of odd things that were later ironed out in the standardization process. This appears to have been one of them.
You don't really need a unary + operator, you can just omit the + and the code compiles fine, but by the same logic you don't need a unary - either, you can always do 0 - 5 instead of -5, though an oversight like this would seem ridiculous.
Related
In C (and some other C-like languages) we have 2 unary operators for working with pointers: the dereference operator (*) and the 'address of' operator (&). They are left unary operators, which introduces an uncertainty in order of operations, for example:
*ptr->field
or
*arr[id]
The order of operations is strictly defined by the standard, but from a human perspective, it is confusing. If the * operator was a right unary operator, the order would be obvious and wouldn't require extra parentheses:
ptr*->field vs ptr->field*
and
arr*[id] vs arr[id]*
So is there a good reason why are the operators left unary, instead of right. One thing that comes to mind would be the declaration of types. Left operators stay near the type name (char *a vs char a*), but there are type declarations, which already break this rule, so why bother (char a[num], char (*a)(char), etc).
Obviously, there are some problems with this approach too, like the
val*=2
Which would be either an *= short hand for val = val * 2 or dereference and assign val* = 2.
However this can be easily solved by requiring a white space between the * and = tokens in case of dereferencing. Once again, nothing groundbreaking, since there is a precedent of such a rule (- -a vs --a).
So why are they left instead of right operators?
Edit:
I want to point out, that I asked this question, because many of the weirder aspects of C have interesting explanations, for why they are the way they are, like the existence of the -> operator or the type declarations or the indexing starting from 0. And so on. The reasons may be no longer valid, but they are still interesting in my opinion.
There indeed is an authoritative source: "The Development of the C Language" by the creator of the language, Dennis M. Ritchie:
An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing. For example, to distinguish indirection through the value returned by a function from calling a function designated by a pointer, one writes *fp() and (*pf)() respectively. The style used in expressions carries through to declarations, so the names might be declared
int *fp();
int (*pf)();
In more ornate but still realistic cases, things become worse:
int *(*pfp)();
is a pointer to a function returning a pointer to an integer. There are two effects occurring. Most important, C has a relatively rich set of ways of describing types (compared, say, with Pascal). Declarations in languages as expressive as C—Algol 68, for example—describe objects equally hard to understand, simply because the objects themselves are complex. A second effect owes to details of the syntax. Declarations in C must be read in an `inside-out' style that many find difficult to grasp [Anderson 80]. Sethi [Sethi 81] observed that many of the nested declarations and expressions would become simpler if the indirection operator had been taken as a postfix operator instead of prefix, but by then it was too late to change.
Thus the reason why * is on the left in C is because it was on the left in B.
B was partially based on BCPL, where the dereferencing operator was !.
This was on the left; the binary ! was an array indexing operator:
a!b
is equivalent to !(a+b).
!a
is the content of the cell whose address is given by a; it can appear on the left of an assignment.
Yet the 50 year old BCPL manual doesn't even contain mentions of the ! operator - instead, the operators were words: unary lv and rv. Since these were understood as if they were functions, it was natural that they preceded the operand; later the longish rv a could then be replaced with syntactic sugar !a.
Many of the current C operator practices can be traced via this route. B alike had a[b] being equivalent to *(a + b) to *(b + a) to b[a] just like in BCPL one could use a!b <=> b!a.
Notice that in B variables were untyped, so certainly similarity with declarations could not have been the reason to use * on the left there.
So the reason for unary * being on the left in C is as boring as "there wasn't any problem in the simpler programs with the unary * being on the left, in the position that everyone was accustomed to have the dereferencing operator in other languages, that no one really thought that some other way would have been better until it was too late to change it".
This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 8 years ago.
For my compiler class, we are gradually creating a pseudo-PASCAL compiler. It does, however, follow the same precedence as C. That being said, in the section where we create prefix and postfix operators, I get 0 for
int a = 1;
int b = 2;
++a - b++ - --b + a--
when C returns a 1. What I don't understand is how you can even get a 1. By doing straight prefix first, the answer should be 2. And by doing postfix first, the answer should be -2. By doing everything left to right, I get zero.
My question is, what should my precedence of my operators be to return a 1?
Operator precedence tells you for example whether ++a - b means (++a) - b or ++(a - b). Clearly it should be the former since the latter isn't even valid. In your implementation it's clearly the former (or you wouldn't be getting a result at all), so you implemeneted operator precedence correctly.
Operator precedence has nothing to do with the order in which subexpressions are evaluated. In fact the order in which the operator operands to + and - are evaluated is unspecified in C and any code that modifies the same variable twice without a sequence point in between invokes undefined behavior. So whichever order you choose is fine and 0 is as valid a result as any other value.
It is illegal to change variables several times in a row like that (roughly between asignments, the standard talks about sequence points). Technically, this is what the C standard calls undefined behaviour. The compiler has no obligation to detect you are writing nonsense, and can assume you will never do. Anything whatsoever can happen when you run the program (or even while compiling). Also check nasal demons in the Jargon File.
The ++ increment and -- decrement operators can be placed before or after a value, different affect. If placed before the operand (prefix), its value is immediately changed, if placed after the operand (postfix) its value is noted first, then the value is changed.
McGrath, Mike. (2006). C programming in easy steps, 2nd Edition. United Kingdom : Computer Step.
First, I know I know. This question has kind of been asked some times before, but most of the answers got on other topics only partly answer my question.
I'm doing something which can parse C like expressions.
That includes expressions for example like (some examples)
1) struct1.struct2.structarray[283].shd->_var
2) *((*array_dptr)[2][1] + 5)
3) struct1.struct2.struct3.var + b * c / 3 % 5
Problem is... I need to be fast on this. The fastest possible, even if it makes the code ugly - well, obviously, the speed improvement must be tangible. The reason is that it is interpreted. It needs to be fast...
I have many questions, and I will probably ask some more depending on your answers. But anyways...
First, I'm aware of "operator priorities". For example algorithms implemented in C compilers will assign to operators a priority number and evaluate the expression based on that.
I've consulted this table : http://en.wikipedia.org/wiki/Operators_in_C_and_C++#Operator_precedence
Now, this is cool but... I wonder a few things.
My principal question is... how would you implement this to be the fastest possible?
I have thought about for example... (please note the program I'm speaking about actually parses a file containing these expressions, and not all C operators will be supported)
1) Stocking the expression string into an array, storing each operator position inside an array, and then starting to parse all this crap, starting from the highest priority operator. For example if I had str = "2*1+3", then after checking all the operators present, I would check for the position at str[1], and the check at right and left, do the operation (here multiply) and then substitude the expression with the result and evaluate again.
The problem I see there is... say two operators in the expr are the same priority
for example : var1 * var2 / var3 / var4
since * and / have both the same precedence, how to know on which position to start the parsing? Of course this example is rather intuitive, but I can the problem growing on enormous expressions.
2) Is this even possible to do it non recursive? Usually recursive means slower due to multiple function call setting their own stack frames, re-initializing stuff etc etc.
3) How to distinguish unary operators from non unaries?
For example : 2 + *a + b * c
There is the dereferencing op and the multiplication one. Intuitively I have an idea on how to do it, but I ain't sure. I'd rather have your advices on this (i think : check if one of the right or left members are operators, if so, then it's unary?)
4) I don't get expressions being evaluated right-to-left. Seems so unnatural to be. More that I don't unterstand what does it means. Would you show an example? Why do it that way?!?
5) Do you have better algorithms in head? Better ideas of achieving it?
For now, that sums pretty much what I'm thinking about.
This ain't an homework by the way. It's practical stuff.
Thanks!
Was there a very early C "standard" where the following was legal for the definition of a two-dimensional array?
int array[const_x, const_y];
int array2[2, 10];
I've stumbled upon some old legacy code which uses this (and only this) notation for multi-dimensional arrays. The code is, except for this oddity, perfectly valid C (and surprisingly well-designed for the time).
As I didn't find any macros which convert between [,] and [][], and I assume it's not a form of practical joke, it seems that once upon a time there hath been thy olde C compiler which accepted this notation. Or did I miss something?
Edit: If it helps, it's for embedded microcontrollers (atmel). From experience I can tell, that embedded compilers are not that well-known for standard-compliance.
The code on current compilers works as intended (as far as it can be guessed from the function names, descriptions and variables) if I change all [,] to [][].
The first formal standard was ANSI X3.159-1989, and the first informal standard would generally be agreed to be the first edition of Kernighan & Ritchie. Neither of these allowed the comma to be used to declare a two-dimensional array.
It appears to be an idiosyncracy of your particular compiler (one that renders it non-standard-conforming, since it would change the semantics of some conforming programs).
Take a look at this forum post.
the comma operator evaluates the left hand side, discards
the result, then evaluates the right hand side. Thus "2,5" is the same
as "5", and "5,2" is the same as "2".
This could be what is happening, although the why of it is beyond me.
Note that comma cannot be used in indexing multidimensional array: the code A[i, j] evaluates to A[j] with the i discarded, instead of the correct A[i][j]. This differs from the syntax in Pascal, where A[i, j] is correct, and can be a source of errors.
From Wikipedia
In K&R Section 5.10, in their sample implementation of a grep-like function, there are these lines:
while (--argc > 0 && (*++argv)[0] == '-')
while (c = *++argv[0])
Understanding the syntax there was one of the most challenging things for me, and even now a couple weeks after viewing it for the first time, I still have to think very slowly through the syntax to make sense of it. I compiled the program with this alternate syntax, but I'm not sure that the second line is allowable. I've just never seen *'s and ++'s interleaved like this, but it makes sense to me, it compiles, and it runs. It also requires no parentheses or brackets, which is maybe part of why it seems more clear to me. I just read the operators in one direction only (right to left) rather than bouncing back and forth to either side of the variable name.
while (--argc > 0 && **++argv == '-')
while (c = *++*argv)
Well for one, that's one way to make anyone reading your code to go huh?!?!?!
So, from a readability standpoint, no, you probably shouldn't write code like that.
Nevertheless, it's valid code and breaks down as this:
*(++(*p))
First, p is dereferenced. Then it is incremented. Then it is dereferenced again.
To make thing worse, this line:
while (c = *++*argv)
has an assignment in the loop-condition. So now you have two side-effects to make your reader's head spin. YAY!!!
Seems valid to me. Of course, you should not read it left to right, that's not how C compiler parses the source, and that's not how C language grammatics work. As a rule of thumb, you should first locate the object that's subject to operating upon (in this case - argv), and then analyze the operators, often, like in this case, from inside (the object) to outside. The actual parsing (and reading) rules are of course more complicated.
P. S. And personally, I think this line of code is really not hard to understand (and I'm not a C programming guru), so I don't think you should surround it with parentheses as Mysticial suggests. That would only make the code look big, if you know what I mean...
There's no ambiguity, even without knowledge of the precedence rules.
Both ++ and * are prefix unary operators; they can only apply to an operand that follows them. The second * can only apply to argv, the ++ to *argv, and the first * to ++*argv. So it's equivalent to *(++(*argv)). There's no possible relationship between the precedences of ++ and * that could make it mean anything else.
This is unlike something like *argv++, which could conceivably be either (*argv)++ or *(argv++), and you have to apply precedence rules to determine which (it's *(argv++)` because postfix operators bind more tightly than prefix unary operators).
There's a constraint that ++ can only be applied to an lvalue; since *argv is an lvalue, that's not a problem.
Is this code valid? Yes, but that's not what you asked.
Is this code acceptable? That depends (acceptable to who?).
I wouldn't consider it acceptable - I'd consider it "harder to read than necessary" for a few different reasons.
First; lots of programmers have to work with several different languages, potentially with different operator precedence rules. If your code looks like it relies on a specific language's operator precedence rules (even if it doesn't) then people have to stop and try to remember which rules apply to which language.
Second; different programmers have different skill levels. If you're ever working in a large team of developers you'll find that the best programmers write code that everyone can understand, and the worst programmers write code that contains subtle bugs that half of the team can't spot. Most C programmers should understand "*++*argv", but a good programmer knows that a small number of "not-so-good" programmers either won't understand it or will take a while to figure it out.
Third; out of all the different ways of writing something, you should choose the variation that expresses your intent the best. For this code you're working with an array, and therefore it should look like you intend to be working with an array (and not a pointer). Note: For the same reason, "uint32_t foo = 0x00000002;" is better than "uint32_t foo = 0x02;".