C: Transforming 'while' loops into 'for' loops

C: Transforming 'while' loops into 'for' loops - c

Our codebase contains a few loops of the form while((*assignment*) *condition*), for example:
while((c = *s++) != '\0') {...}
while((i = getNext()) != NULL) {...}
Unfortunately these are causing the compiler to emit an "assignment in condition" warning, which I would like to get rid of. My plan is to transform the while loops into for loops of the form for(assignment; condition; assignment), for example:
for(c = *s++; c != '\0'; c = *s++) {...}
for(i = getNext(); i != 0; i = getNext()) {...}
Is this transformation valid? Is there a better alternative?

The transformations are valid, yes, but they result in code that's harder to maintain, simply because there are two places you have to change the assignment.
I would be more inclined to figure out how to turn off that particular warning (even if it's localised with something like the gcc #pragma warning pragma) since it is, after all, perfectly valid C code, both syntactically and (in this case) semantically.

Personally, I'd write the first loop like this:
for (char c; (c = *s) != '\0'; ++s)
{
// ...
}
This makes it clear that the s is the thing that's being incremented. You can also omit the != '\0', which is implicit.
I'd keep the second loop as a while loop, or at least leave the assignment-inside-conditional. Maybe like this, to minimize scope pollution:
for (iterator i; i = getNext(); )
{
// ...
}
I personally find it quite acceptable to have part of the per-loop activity happening inside the conditional; that's natural for things like std::cin >> n and std::getline(file, line), too.

You haven't stated what your compiler is, but any compiler of quality allows such a warning to be turned off. But if you can't manage that:
for(;;)
{
c = *s++;
if (c == '\0') break;
...
}
is equivalent and is more general. Likewise:
for(;;)
{
i = getNext();
if (!i) break;
...
}
In many cases, better than the first one (but not equivalent) is:
for(;; s++)
{
c = *s;
if (c == '\0') break;
...
}
These are more verbose and ugly, but they are much better than your transformation that repeats the code, which is fragile and error-prone.

If my understanding of loops is correct your transformation is completely valid, but this transformation seems to be a lot harder to read than the while loop. Just do the initialization before the while loop and then increment at the end of the loop to get rid of the warning.
c = *s++;
while(c != '\0')
{
...
c = *s++;
}

Related

break vs goto: Which is more appropriate in the following bool function in C? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I wrote the following function which is meant to check if there is any non digit char in a string. The function must stop right after finding any non digit char and exit the loop and return true. Else, it will return false. Here's the code I am using: (ctype.h library is included).
bool isnotdigit(string argv)
{
bool y = false;
for (int i = 0, n = strlen(argv); i < n; i++)
{
char c = argv[i];
if (! isdigit(c))
{
y = true;
break;
}
}
return y;
}
It can also be done this way:
bool isnotdigit(string argv)
{
bool y = false;
for (int i = 0, n = strlen(argv); i < n; i++)
{
char c = argv[i];
if (! isdigit(c))
{
y = true;
goto next;
}
}
next:
return y;
}
If I am not wrong, both codes work the same way. Right? Then, what is the pros and cons of both? Especially in case of the bool function above.

This is a programming style question, and as such, you are probably not going to get a definitive answer.
Some say that break is just a goto in disguise, so that one is as bad as the other. Some people say you should never use break.
But the whole point of break is that it's a guaranteed-to-be-unconfusing goto. It always goes from the inside of a loop, to the outside of a loop. (Well, except when it's part of a switch statement, of course.)
In this case, the goto buys you little or nothing. So I don't think anyone would say that your second example is much better than your first. (When there are nested loops, things can get more complicated. More on that in a bit.)
And, finally, yet another way of writing your function is like this:
bool isnotdigit(string argv)
{
for (int i = 0, n = strlen(argv); i < n; i++)
{
char c = argv[i];
if (! isdigit(c))
{
return false;
}
}
return true;
}
But some people say that this is bad style, because (they say) a function should have exactly one return statement. Other people, however, say that this is good style, because it gets rid of the extra boolean variable y, and getting rid of extra variables (especially extra little Boolean variables that just keep track of things) is a good rule, too.
Here are my opinions:
goto can be confusing, and you almost never need it. I almost never use goto.
break is fine. It's almost never confusing. I use it all the time. So I would very much prefer your first fragment over your second.
Extra boolean variables, that just keep track of little things, can be confusing, especially when the only reason they're there is to get around some other "rule".
I have no problem with multiple return statements, so I would always use something more like the third alternative, as I've presented it in this answer.
See also this older question.
Now, in fairness, the argument against multiple return statements can be a good one. If you've got cleanup to do before you exit, or if you ever have to add some cleanup code in the future, it can be very easy to forget to add it in both places (or in three or more places) if there are multiple return statements. For me, when the function is small and simple (as it is in this case), I think the cleanliness (and the loss of the extra Boolean variable) outweighs the risks. But if your metastyle is that you don't like judgment calls, if you like rigid rules that you can apply everywhere to avoid the superficial risks no matter what, then the "no multiple return statements" rule makes sense.
Finally, there's the question of nested loops. The only justification I can imagine for the goto next; usage in your second example is that if a later programmer comes along and adds a nested loop, the code with break would probably not work any more, and would have to be reworked somehow. By using goto, the rationalization might go, the code is more robust against that possibility. (Personally, I don't think that's a good rationalization for the goto, but as I say, it's the only one I can think of.)
Once you have nested loops, the pros and cons (that is, goto versus break versus multiple return statements) definitely shift around, as discussed in the answers to that other question.
P.S. Some languages "solve" the break-out-of-nested-loops problem by having something like break(2) that can break out of multiple loops at once. C does not have this, and the reason is that it was felt to be too potentially confusing, and too likely to, um, break if a later maintenance programmer added or removed a level of nesting.
Or in other words, although single-level-break is a guaranteed-to-be-unconfusing goto, multi-level-break is potentially just as confusing as goto. And, of course, you can argue with my characterizations here: after all, who says that break is guaranteed to be unconfusing? It's not strictly guaranteed, of course, but that's because any language feature can be confusing if you use it badly. (Case in point: extra little boolean variables that just keep track of various things.)

They are functionally equivalent, but I would prefer to only use goto to break out of a nested loop, not a single loop.
goto is often frowned upon because it can lead to "spaghetti code", but it has its place in C for:
Breaking cleanly out of a nested loop
Resource deallocation, as so:
char *buffer1 = NULL, *buffer2 = NULL, *buffer3 = NULL;
buffer1 = malloc(1000);
if(NULL == buffer1)
{
goto cleanup;
}
buffer2 = malloc(1000);
if(NULL == buffer2)
{
goto cleanup;
}
buffer3 = malloc(1000);
if(NULL == buffer3)
{
goto cleanup;
}
use(buffer1, buffer2, buffer3);
cleanup:
if(buffer1 != NULL)
{
free(buffer1);
buffer1 = NULL;
}
if(buffer2 != NULL)
{
free(buffer2);
buffer2 = NULL;
}
if(buffer3 != NULL)
{
free(buffer3);
buffer3 = NULL;
}

Never use goto if there are alternatives.
But in this function you need neither, just return when the result is clear:
bool isnotdigit(string argv)
{
for (int i = 0, n = strlen(argv); i < n; i++)
{
char c = argv[i];
if (! isdigit(c))
{
return true;
}
}
return false;
}

At least in a simple case as in your example I suggest to avoid both goto and break and add the condition to the for statement instead. (This may be a matter of personal preference or coding guidelines that might apply to your project.)
bool isnotdigit(string argv)
{
bool y = false;
for (int i = 0, n = strlen(argv); (i < n) && !y; i++)
{
char c = argv[i];
/* or alternatively: y = ! isdigit(c); */
if (! isdigit(c))
{
y = true;
}
}
return y;
}
BTW: I suggest to rename the function argument, because argv can be confused with the standard arguments of the main function which are often called argc and argv. Instead of y I would suggest a meaningful name, e.g. wrong_char_found

Why not combine the "for" and "while" loop to a single loop?

I'm currently making my own programming language for fun using Ruby and wondering why having a "while" and "for" loop is something so universal. Why is it be bad to just have a single "loop" keyword?
ex)
loop (True) # acts as a while loop with a condition
loop var in 1..20 # acts as a for loop
loop var in list # acts as a for each loop
Would this somehow be disadvantageous or is the "while" and "for" loop simply just for semantic and readability?

In fact, there are even more loop-variants. For example do-while.
To answer your question:
Yes, all of them can be used to do the same thing. However, depending on the loop-variant you use, you implicitly say what this loop does.
For example, you only use a for-loop if you know how often you want to loop (e.g. for each item in a list (i know, there is also a foreach in many languages)). It is considered bad habit to modify the loop variable inside a for-loop.
So if you are not sure how often you will loop, you should most likely use something else than a for-loop... for example a while-loop.
Here are some simple code examples:
for(i = 0; i < someList.Length; i++) {
// do some stuff x times
}
while(someValue < someOtherValue) {
someValue = someValue * someValue;
}
For me it is clear, that i should not use a for-loop for the 2nd loop.
To sum up: Yes, you only need one loop construct. However, I personally (!) like to have the advantage to actually say something just by the choice which loop i take.

Would this somehow be disadvantageous or is the "while" and "for" loop simply just for semantic and readability?
I think that your idea would not be disadvantageous and, yes, while and for are chosen for semantics/readability (or even historical reasons). Then, there are slightly different meanings in every language. Often, the for loop implies that you have a control variable. From the classic basic
for i=1 to 10 ... (next i)
to pascal, and even in more advanced constructs (iterators) from python, this idea is respected. Moreover, in some languages (pascal, for example) a for implies that the low-high limits are pre-calculated. This pascal code for example:
B := 5;
for i := 1 to B do begin
B := 25
end;
can give warnings or errors, or can lead to 5 iterations even if in the body of the loop the upper limit is modified. The C language, which also has for, is totally different, even if the general idea of "control variable" is respected in 99% of the cases. The C for does not pre-compute limits (by design), can have more than one control variable, or even none at all:
for ( ; ; ) ...
is a valid for loop which does not have a control variable (and no meaning to break the loop!).
On the other hand, while, repeat, do ... while and so on are semantically clear (you can "talk" the code with your voice, and everything is self explaining), and do not imply variables (they imply nothing).
The above loops all have a common problem: they have a test which is performed at the beginning (or the end) of every iteration. Suppose you want to read characters from standard input, do something with them, and stop when this character is EOF. The C way is:
while ( ( ch=getchar() ) != EOF) ... ; // do something with ch
The C language can do this because an assignment is also an expression. If it was not so, one had to write:
ch=getchar();
while (ch != EOF) {
.... ; // do something with ch
ch = getchar();
}
// I must use two times the statement "ch = getchar();"
// or...
do {
ch = getchar();
if (ch == EOF) break;
... ; // do something with ch
} while (true)
// I use getchar() only once, but I end up with an awful "while (true)"
In your new language, you could invent a different cycle which goes like this:
cycle
// statements always executed at least once
when (condition)
// statement executed if the condition is true
// and, if true, the cycle restarts
end
With this syntax, the example about getchar() would become:
cycle ch=getchar(); when (ch != EOF) ... ; // do something bla bla bla...
The normal while, while (true) and do-while loops would become:
cycle when (condition); BODY // while
cycle BODY; // while (true)
cycle BODY; when (condition) // do-while
Think about it... :-)

Most languages have for and while for historical reasons and familiarity. Familiarity is important: if you reuse the same concepts and keywords as other languages, your language is easier to learn. Many languages use "!" for the negation for this reason (many other languages use "not"). Look at this page comparing syntax across languages, you'll see lots of similarities: http://rigaux.org/language-study/syntax-across-languages.html
Now, to answer your question. Although most languages have the while keyword, you really don't need it.
Best example I can think of is Go:
// C-like for
for i := 0; i < 10; i++ {
sum += i
}
// while
for sum < 1000 {
sum += sum
}
// infinite loop
for {
}
// do .. while
for ok := true; ok; ok = condition {
...
}
// range-for
for i, v := range pow {
fmt.Printf("2**%d = %d\n", i, v)
}
So, as you can see, it's perfectly possible to use a single keyword for all loops.
Main reason for having both for and while is familiarity and readability. But you can easily do without.

Proper way to write 'one but not both' situation?

This question is really arbitrary so I'll try to explain it as best as I can. I'm looping through two strings of unknown size.
bool check(char *str1, char *str2)
{
char special = 'k';
for (int size_t i = 0; ; i++)
{
}
}
I want the terminating condition of the for loop to be the following:
Leave the loop only if either str1[i] == special OR str2[i] == special, but not both.
For this question, ignore the fact that I might segment fault since I know neither the size nor am I checking for 0x00.
I know how to write this but it's always really messy and involves using ternary conditional operators. What is a better way to write it?

You could use (str1[i] == special) != (str2[i] == special), as suggested here.
This works because in c, == can only return one of the int values 0 or 1 (reference).

You want the XOR operator written as ^ use it like you would and && or or ||. It is true only if one but not both arguments are true.

Oops: now see OP said "For this question, ignore the fact that I might segment fault since I know neither the size nor am I checking for 0x00."
So my concern below is moot. Leaving as a reference.
Since code is working with strings, the loop must terminate on 3 conditions:
Leave the loop if either (str1[i] == special) != (str2[i] == special), but not both.
str1[i] == 0.
str2[i] == 0.
Code code be
for (int size_t i = 0;
((str1[i] == special) != (str2[i] == special)) && str1[i] && str2[i]);
i++) {
...
}
Perhaps a simplification could be had.

Is it possible for an IF statement to be in the for loop's third expression (C)?

While learning from K&R, I tried to modify the while loop in exercise 1.5.3 in to a for loop:
#include <stdio.h>
main()
{
int c, nl;
for (nl = 0; (c = getchar()) != EOF; if (c == '\n'){++nl}){}
printf("%d\n", nl);
}
SIDE NOTE: I apologize if this code looks abominable. For now, I am just trying to write valid C code, and when I feel that I have adapted to the language's general concepts, I will start writing code for "humans."
When I try to build and run the code, this message appears,
error: expected expression before 'if'
I tried to search this error, but I could only find answers that either referred to other languages or ternary operations, which I would like to avoid if possible. Thanks in advance for any insight into resolving this issue, if it even is one.

No, but if you really want to squeeze it in the loop (although I would not recommend it), you could do
nl += (c == '\n')? 1: 0
Or, as #LogicG8 suggested, just
nl += (c == '\n')
although it is a bit less clear and may violate some coding guidelines.

No, not possible. The third clause has to be an expression and if statements are statements, not expressions.

if is a statement, the header of a for loop can only contain expressions, not statements.
You can use a ternary operator there, it's the way to perform a conditional in an expression.
for (nl = 0; (c = getchar()) != EOF; (c == '\n') ? ++nl : 0) {}

In short: This is not possible, you could use some inline-ifs somehow, though.
But, even in case it would work, you shouldn't do it because....
...the simpler the for-loop, the better the optimization. Don't confuse the compiler.
...you might also confuse other developers if they have to read and unerstand that code.
Keep it simple whenever possible.

assignment works as a condition

Consider the following Code,
int i;
while(i=0)
printf("Hello");
Now Generally speaking i=0 is an assignment and not a condition for while to check.
But the GCC compiler lets it go with a warning and even evaluates it correctly (does not execute the print statement).
Why? I usually would do with parenthesis for the truth value but my juniors feel that I am wrong and there is no real reason for the parenthesis in this!
EDIT: Zeroing down on the 'actual' doubt, Please consider the following test case
int callme(){
return 0;
}
int main(int argc,char*argv[]){
int c;
while(c = callme()){
printf("Calling...\n");
}
return 0;
}

The expression i = 0 does 2 things:
Has the side effect of storing o in i
Yields the value 0
I usually would do with parenthesis for the truth value but my juniors
feel that i am wrong and there is no real reason for the parenthesis
in this
It's usually a hint to the compiler meaning "I actually want this, I didn't forget a =, shut up".
For your specific case there's no reason to write if (i = 0): you already know what if (0) does. But it's pretty useful when used as:
if ((i = some_function()))
...

i=0 is always an assignment (unless you have it as part of int i = 0; where it is an initialization). But any non-void expression may appear inside the condition of a while loop and if it evaluates to non-zero, the body of the loop will be executed, and if it is zero, the body of the loop will not be executed.
The notation:
while (i = 0)
printf("Hello\n");
is always equivalent to:
i = 0;
There is very little justification for writing the loop at all.
People do write other expressions:
while (c = getchar())
...process EOF or a non-null character...
But that's usually a bug. It is more likely that you should be writing:
while ((c = getchar()) != EOF)
...process a character - possibly null...
or even:
while ((c = getchar()) != EOF && c != '\0')
...process a non-null character...
The first getchar() loop gets a warning from GCC; the latter two do not because of the the explicit test of the value from the assignment.
The people who write a condition like this:
while ((c = getchar()))
really annoy me. It avoids the warning from GCC, but it is not (IMNSHO) a good way of coding.

When you use an assignment operator such as
a=0;
You assign the value to 'a', and still return the number 0.
To test your question, I tried these lines of codes:
int a;
printf("%d", a=0);
and these lines displayed 0.
Then, I tested another set of codes:
int b;
printf("%d", b=15);
Here, the lines displayed 15.
So, if you do:
while(a=0)
{
printf("zero");
}
The (a=0) statement would return false, thus not displaying anything.
But if you do:
while(a=15)
{
printf("fifteen");
}
The "fifteen" will be displayed endlessly, because the statement (a=15) will return a non zero value, or 15, which is not false, not zero, thus it is true. :)

As cnicutar has told above the assignment also yields the value zero.
Some additional info:
It is a common coding mistake for people to omit an extra '=' whereby the comparison becomes an assignment.
An easy way to avoid this is to write the comparison as below, in which case even if a '=' is missed compiler will give an error
while(0 == i)
{
prinf("Hello");
}