Is *++*p acceptable syntax? - c

In K&R Section 5.10, in their sample implementation of a grep-like function, there are these lines:
while (--argc > 0 && (*++argv)[0] == '-')
while (c = *++argv[0])
Understanding the syntax there was one of the most challenging things for me, and even now a couple weeks after viewing it for the first time, I still have to think very slowly through the syntax to make sense of it. I compiled the program with this alternate syntax, but I'm not sure that the second line is allowable. I've just never seen *'s and ++'s interleaved like this, but it makes sense to me, it compiles, and it runs. It also requires no parentheses or brackets, which is maybe part of why it seems more clear to me. I just read the operators in one direction only (right to left) rather than bouncing back and forth to either side of the variable name.
while (--argc > 0 && **++argv == '-')
while (c = *++*argv)

Well for one, that's one way to make anyone reading your code to go huh?!?!?!
So, from a readability standpoint, no, you probably shouldn't write code like that.
Nevertheless, it's valid code and breaks down as this:
*(++(*p))
First, p is dereferenced. Then it is incremented. Then it is dereferenced again.
To make thing worse, this line:
while (c = *++*argv)
has an assignment in the loop-condition. So now you have two side-effects to make your reader's head spin. YAY!!!

Seems valid to me. Of course, you should not read it left to right, that's not how C compiler parses the source, and that's not how C language grammatics work. As a rule of thumb, you should first locate the object that's subject to operating upon (in this case - argv), and then analyze the operators, often, like in this case, from inside (the object) to outside. The actual parsing (and reading) rules are of course more complicated.
P. S. And personally, I think this line of code is really not hard to understand (and I'm not a C programming guru), so I don't think you should surround it with parentheses as Mysticial suggests. That would only make the code look big, if you know what I mean...

There's no ambiguity, even without knowledge of the precedence rules.
Both ++ and * are prefix unary operators; they can only apply to an operand that follows them. The second * can only apply to argv, the ++ to *argv, and the first * to ++*argv. So it's equivalent to *(++(*argv)). There's no possible relationship between the precedences of ++ and * that could make it mean anything else.
This is unlike something like *argv++, which could conceivably be either (*argv)++ or *(argv++), and you have to apply precedence rules to determine which (it's *(argv++)` because postfix operators bind more tightly than prefix unary operators).
There's a constraint that ++ can only be applied to an lvalue; since *argv is an lvalue, that's not a problem.

Is this code valid? Yes, but that's not what you asked.
Is this code acceptable? That depends (acceptable to who?).
I wouldn't consider it acceptable - I'd consider it "harder to read than necessary" for a few different reasons.
First; lots of programmers have to work with several different languages, potentially with different operator precedence rules. If your code looks like it relies on a specific language's operator precedence rules (even if it doesn't) then people have to stop and try to remember which rules apply to which language.
Second; different programmers have different skill levels. If you're ever working in a large team of developers you'll find that the best programmers write code that everyone can understand, and the worst programmers write code that contains subtle bugs that half of the team can't spot. Most C programmers should understand "*++*argv", but a good programmer knows that a small number of "not-so-good" programmers either won't understand it or will take a while to figure it out.
Third; out of all the different ways of writing something, you should choose the variation that expresses your intent the best. For this code you're working with an array, and therefore it should look like you intend to be working with an array (and not a pointer). Note: For the same reason, "uint32_t foo = 0x00000002;" is better than "uint32_t foo = 0x02;".

Related

MISRA 13.5 question about non-compilant example

Trying to understand a non-compliant example of Rule 13.5.
MISRA-2012 Rule 13.5 states "The right hand operand of a logical && or || operator shall not contain persistent side effects" With the rationale being "... the side effects may or may not occure which may be contrary to programmer expectations."
I understand and totally agree with this. However their final example of non-compliant code is:
/* Non-compliant if fp points to a function with persistent side effects */
( fp != NULL ) && ( *fp ) ( 0 );
This construct seems perfectly safe in that the condition and the decision to call the function are directly tied, where the intent is to not dereference a NULL pointer. I understand an if statement would be clearer but would be interested if anyone has further insight.
This construct seems perfectly safe in that the condition and the decision to call the function are directly tied, where the intent is to not dereference a NULL pointer. I understand an if statement would be clearer but would be interested if anyone has further insight.
MISRA attempts to define rules that can be interpreted without having to guess the programmer's intent. So yes, the construct you present is fine if it is intentional to avoid the function call in the event that the pointer is NULL, but a machine performing MISRA analysis of that code does not necessarily recognize that likelihood. The rule is primarily aimed at conditional statements where the two operands of && or || are not directly related. Rejecting the case you describe is collateral damage.
Of course, you can replace your case with
if (fp != NULL) {
(*fp)(0);
}
. Personally, I find the if statement clearer than the original expression statement. That's not such a clear call when an expression such as your original one appears in the condition of an if, while, or for statement, but all of those can be restructured to comply with MISRA, too.
Rule 13.5 is a Required Rule, and seeks to prevent situations where a user might assume the the right hand side is executed, but for the sort-circuit evaluation.
In situations like the example cited, it is (probably) OK - in fact, this is quite a common idiom.
There are two options...
Restructure the code, as suggested by #John Bollinger
Deviate the Rule - this requires you to justify why the Rule may be safely violated, which in the case cited, should be straight-forward
See profile for affilation

how to solve this expression taking in mind precedence and associativity?

int p=4, s=5;
int m;
m = (p=s) * (p==s);
printf("%d",m);
How is this expression evaluated?
What will be the value of m?
How do parenthesis change the precedence of operators in this example?
I am getting m=4 in Turbo C.
Is this a code fragment that actually came up in your work, or is it an assignment someone gave you?
The expression contains two parts:
p=s /* in this part p's value is assigned */
p==s /* in this part p's value is used */
So before we can figure out what the value of the expression is, we have to figure out: does p's value get set before or after it gets used?
And the answer is -- I'm going to shout a little here -- WE DO NOT KNOW.
Let me say that again. We simply do not know whether p's value gets set before or after it gets used. So we have no way of predicting what value this expression will evaluate to.
Now, you might think that precedence will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case precedence does not tell you whether p gets set before or after it gets used.
You might think that associativity will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case associativity does not tell you whether p gets set before or after it gets used, either.
Finally, you might think that I'm wrong, that precedence and/or associativity have to be able to tell you what you need to know, that there has to be a way of figuring out what this expression does. But it turns out that I'm right: there is no way of figuring out what this expression does.
You could compile it and try it, but that will only tell you how the compiler you're using today chooses to evaluate it. I guarantee you that there's another compiler out there that will evaluate it differently.
You might ask, which compiler is right? And the answer is, they're both right (or at least, neither of them is wrong). As far as the C language is concerned, this expression is undefined. So there's no right answer, and your compiler isn't wrong no matter what it does.
If this is code that came up in your work, please delete it right away, and figure out what you were really trying to do, and figure out some cleaner, well-defined way of expressing it. And if this code is an assignment, your answer is simply: it's undefined. (If your instructor believes this expression has a well-defined result, you're in the unfortunate position of having an instructor who doesn't know what he's talking about.)
You're reading and writing the variable p multiple times in the same expression without a sequence point, which means the compiler is free to evaluate the two sub-expressions in any order.
This results in undefined behavior, meaning the program's behavior is unpredictable.

Evaluating expressions with operators

First, I know I know. This question has kind of been asked some times before, but most of the answers got on other topics only partly answer my question.
I'm doing something which can parse C like expressions.
That includes expressions for example like (some examples)
1) struct1.struct2.structarray[283].shd->_var
2) *((*array_dptr)[2][1] + 5)
3) struct1.struct2.struct3.var + b * c / 3 % 5
Problem is... I need to be fast on this. The fastest possible, even if it makes the code ugly - well, obviously, the speed improvement must be tangible. The reason is that it is interpreted. It needs to be fast...
I have many questions, and I will probably ask some more depending on your answers. But anyways...
First, I'm aware of "operator priorities". For example algorithms implemented in C compilers will assign to operators a priority number and evaluate the expression based on that.
I've consulted this table : http://en.wikipedia.org/wiki/Operators_in_C_and_C++#Operator_precedence
Now, this is cool but... I wonder a few things.
My principal question is... how would you implement this to be the fastest possible?
I have thought about for example... (please note the program I'm speaking about actually parses a file containing these expressions, and not all C operators will be supported)
1) Stocking the expression string into an array, storing each operator position inside an array, and then starting to parse all this crap, starting from the highest priority operator. For example if I had str = "2*1+3", then after checking all the operators present, I would check for the position at str[1], and the check at right and left, do the operation (here multiply) and then substitude the expression with the result and evaluate again.
The problem I see there is... say two operators in the expr are the same priority
for example : var1 * var2 / var3 / var4
since * and / have both the same precedence, how to know on which position to start the parsing? Of course this example is rather intuitive, but I can the problem growing on enormous expressions.
2) Is this even possible to do it non recursive? Usually recursive means slower due to multiple function call setting their own stack frames, re-initializing stuff etc etc.
3) How to distinguish unary operators from non unaries?
For example : 2 + *a + b * c
There is the dereferencing op and the multiplication one. Intuitively I have an idea on how to do it, but I ain't sure. I'd rather have your advices on this (i think : check if one of the right or left members are operators, if so, then it's unary?)
4) I don't get expressions being evaluated right-to-left. Seems so unnatural to be. More that I don't unterstand what does it means. Would you show an example? Why do it that way?!?
5) Do you have better algorithms in head? Better ideas of achieving it?
For now, that sums pretty much what I'm thinking about.
This ain't an homework by the way. It's practical stuff.
Thanks!

What the difference between initialization on declaration and initialization in a next step?

I need to initialize a variable and test its value.
What is the most efficient way to do this?
char *key = get_key(item);
if (key != NULL) { // do something }
OR
char *key;
if (key = get_key(item)) { // do something }
Do side effects produce some advantage or not?
Thanks!
Ah, good ol' "most efficient way"... No. NOPE. Forget efficiency.
Even if there was a difference, this would very likely be premature optimization. But in this particular case, the only difference is in terminology (your second example is not, technically, initialization but an assignment expression), and your compiler will almost certainly generate the very same assembler out of the two pieces of code.
In your simple program, the question to ask is: will you reuse the assignment? The cost of the assignment will be one memory store. If not used at all, then I would just not add the cost of the store. The compiler is probably doing that anyways, but doesn't hurt to help it :-)
If you do intend to reuse key, then I personally prefer not doing it inside the if() not in the declaration step for readability. So, I would do the second way, but keep the assignment out of the if().
For your specific example, there is likely no difference. Build both and compare the generated code to check.
A standalone declaration (char *key;) produces no code at all, so most likely both code fragments will result in the same generated code, but arguably the first fragment is slightly easier to read.
Ask yourself if efficiency is important. (answer: highly unlikely)
Go for readability. With that in mind doing an assignment inside an if condition is questionable, so I would strongly prefer the first version.
Both code snippets are equivalent in behavior.
I personally don't like the second one as side-effects in the controlling expression of an if statement tend to make the code less readable and particularly when the side-effect is made with an assignment. This is by the way one of the reason (the tendency to make errors) why a language like Python forbids the use of the assignment operator in the controlling expression of the if statement.
This being said, I also don't like side-effects at initialization time (the function call in the declaration in the first code snippet).

C Programming: difference between ++i and i=i+1 from an assembler point of view?

This was an interview question. I said they were the same, but this was adjudged an incorrect response. From the assembler point of view, is there any imaginable difference? I have compiled two short C programs using default gcc optimization and -S to see the assembler output, and they are the same.
The interviewer may have wanted an answer something like this:
i=i+1 will have to load the value of i, add one to it, and then store the result back to i. In contrast, ++i may simply increment the value using a single assembly instruction, so in theory it could be more efficient. However, most compilers will optimize away the difference, and the generated code will be exactly the same.
FWIW, the fact that you know how to look at assembly makes you a better programmer than 90% of the people I've had to interview over the years. Take solace in the fact that you won't have to work with the clueless loser who interviewed you.
Looks like you were right and they were wrong. I had a similar issue in a job interview, where I gave the correct answer that was deemed incorrect.
I confidently argued the point with my interviewer, who obviously took offence to my impudence. I didn't get the job, but then again, working under someone who "knows everything" wouldn't be that desirable either.
You are probably right. A naive compiler might do:
++i to inc [ax]
and
i = i + 1 to add [ax], 1
but any half sensible compiler will just optimize adding 1 to the first version.
This all assumes the relevant architecture has inc and add instructions (like x86 does).
To defend the interviewer, context is everything. What is the type of i? Are we talking C or C++ (or some other C like language)? Were you given:
++i;
i = i + 1;
or was there more context?
If I had been asked this, my first response would've been "is i volatile?" If the answer is yes, then the difference is huge. If not, the difference is small and semantic, but pragmatically none. The proof of that is the difference in parse tree, and the ultimate meaning of the subtrees generated.
So it sounds like you got the pragmatic side right, but the semantic/critical thought side wrong.
To attack the interviewer (without context), I'd have to wonder what the purpose of the question was. If I asked the question, I'd want to use it to find out if the candidate knew subtle semantic differences, how to generate a parse tree, how to think critically and so on and so forth. I typically ask a C question of my interviewees that nearly every candidate gets wrong - and that's by design. I actually don't care about the answer to the question, I care about the journey that I will be taking with the candidate to reach understanding, which tells me far more about than right/wrong on a trivia question.
In C++, it depends if i is an int or an object. If it's an object, it would probably generate a temporary instance.
the context is the main thing here because on a optimized release build the compiler will optimize away the i++ if its available to a simple [inc eax]. whereas something like int some_int = i++ would need to store the i value in some_int FIRST and only then increment i.

Resources