Style question !condition agains condition == NULL

Style question !condition agains condition == NULL - c

If you call some function, and that functions returns NULL in case of an error (think of malloc() or fopen() for example), which of the two is better:
FILE *fp = fopen(argv[0], "r");
if (fp == NULL) {
// handle error
}
or
if (!fp) {
// handle error
}
Is it just a matter of style? I think the first one is clearer being more explicit but then I rarely code in C :-).

I prefer comparing with NULL, because it makes it clear that both operands of the comparison are supposed to be pointers. This
(!p)
or this
(p == 0)
require that you know what p's type is (an integer? a boolean?) at a glance. I am of the opinion that all coding should be done with the assumption that you are going to have to debug the thing at 4am (that's 4 in the morning, for the sleepless out there) 9 months later. In that case every little bit helps.
Oh, and it's good practice to place constants as the first operand when testing for equality, so that the compiler will abort with an error if you accidentally turn it into an assignment.

I believe this is a matter of style. Personally, I like the second option better. Others like the first one because it is clearer and more "proper". Some people even write if (NULL == fp) so they can never accidentally forget one = and turn it into an assignment. All in all though, I think it's a matter of taste, and it's probably more important to be somewhat consistent.

I prefer the first one in this case, as you are explicitly comparing the value to see if it's null (which happens to be 0).
The second one reads as if fp is a boolean, which it isn't.
It's like saying "Is this pointer invalid?" vs "Is this pointer false?"
Which one is more readable to you is, of course, a matter of opinion.

I prefer "=="; I think the reader has to think less. This is also why I detest typedefs.

Is it just a matter of style?
In the case of C, it is just a matter of style as both are correct, but in general, I believe more people (including me) prefer an explicit comparison (ptr == NULL) as evidenced by the following:
C++0x introduces a nullptr keyword to emphasize that it is more than just a mere number or boolean value.
Java forces explicit comparisons (obj == null) and does not allow !obj.

Yes, this is matter of style. fp == NULL (oops, I wrote fp = NULL...) is very clear and explicit in that it express, and it is good for those not familiar with all C's twists and turns. Altough !fp is very like an idiom and pun: "there's not(!)hing at fp". Ant it is short. For this i like !fp. I think C designers also like this otherwise they should not define ! for pointers. :)

Related

What is efficient way to test zero length string: `strlen(str) > 0` or `*str`

Assuming char *str;, to test the empty string following approaches works
/* Approrach 1 */
if (str && strlen(str) > 0) {
...
}
/* Approach 2 */
if (str && *str) {
...
}
What is the most preferable to use? I assume second will be faster as it does not have to iterate over the buffer to get the length. Also any downfalls of using the second?

I'll give a third option that I find would be better:
if (str && str[0]) {
// ...
}
The strlen method isn't ideal since it can iterate over a non-zero-length string. The compiler may optimize out that call entirely (as has been pointed out), but it won't on every compiler (and I assume the -ffreestanding option would disable this optimization), and it at least makes it look like more work needs to happen.
However, I consider the [0] to have much clearer intent than a *. I generally recommend using * when dereferencing a pointer to a single object, and using [0] when that pointer is to the first element of an array.
To be extra clear, you can do:
if (str && str[0] != '\0') {
// ...
}
but that starts tipping the special-to-alphanum-characters ratio towards hard-to-read.

Approach 2, because the first attempt will iterate over the string if it's not empty. And no, there are no downsides to approach 2.

It's unlikely that the compiler would generate different code if you have optimization enabled. But if performance REALLY is an issue it's probably better to use the second approach. I say probably, because it's not 100% certain. Who knows what the optimizer will do? But there is a risk that the first approach will iterate over the whole string if it's not empty.
When it comes to readability, I'd say that the first is slightly more readable. However, using *str to test for an empty string is very idiomatic in C. Any seasoned C coder would instantly understand what it means. So TBH, the readability issue is mostly in case someone who is not a C programmer will read the code. If someone does not understand what if (str && *str) does, then you don't want them to modify the code either. ;)
If there is a coding standard for the code base you're working on, stick to that. If there's not, pick the one you like most. It does not really matter.

What is the most preferable to use?
Obviously number one, as that is most readable.

I would ignore performance issues. Both CLANG and GCC will generate the same code for "-O3" options.
See godbolt
The first approach expresses programmer's intentions a bit more explicitly.

Solve ÿ end of file in C

When I copy the content of a file to another in C at the end of the output file I have this character ÿ. I understand thanks to this forum that it is the EOF indicator but I don't understand what to do in order to get rid of it in the output file.
This is my code:
second_file = fopen(argv[2], "w+");
while (curr_char != EOF)
{
curr_char = fgetc(original_file);
fputc(curr_char, second_file);
}
printf("Your file has been successfully copy\n");
fclose(second_file);
fclose(original_file);

For each character you read, you have two things to do:
Check to see if it's EOF.
If not, write it to the output.
Your problem is you're doing these two things in the wrong order.
There are potentially several different ways of solving this. Which one you pick depends on how much you care about your program looking good, as opposed to merely working.
One. Starting with the code you wrote, we could change it to:
while (curr_char != EOF)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
Here, we explicitly test to see if the character is EOF, immediately after reading it, before writing it. If it's EOF, we break out of the loop early. This will work, but it's ugly: there are two different places where we test for EOF, and one of them never "fires". (Also, as a commentator reminded me, there's the problem that the first time through the loop, we're testing curr_char before we've ever set it.)
Two. You could rearrange it like this:
curr_char = getc(original_file);
while (curr_char != EOF)
{
putc(curr_char, second_file);
curr_char = getc(original_file);
}
Here, we read an initial character, and as long as it's not EOF, we write it and read another. This will work just fine, but but it's still a little bit ugly, because this time there are two different places where we read the character.
Three. You could rearrange it like this:
while ((curr_char = getc(original_file)) != EOF)
{
putc(curr_char, second_file);
}
This is the conventional way of writing a character-copying loop in C. The call to getc and the assignment to curr_char are buried inside of the controlling expression of the while loop. It depends on the fact that in C, an assignment expression has a value just like any other expression. That is, the value of the expression a = b is whatever value we just assigned to a (that is, b's value). So the value of the expression curr_char = getc(original_file) is the character we just read. So when we say while ((curr_char = getc(original_file)) != EOF), what we're actually saying is, "Call getc, assign the result to curr_char, and if it's not equal to EOF, take another trip around the loop."
(If you're still having trouble seeing this, I've written other explanations in these notes and this writeup.)
This code is both good and bad. It's good because we've got exactly one place we read characters, one place we test characters, and one place we write characters. But it's a little bit bad because, let's admit it, it's somewhat cryptic at first. It's hard to think about that assignment-buried-inside-the-while-condition. It's code like this that gives C a reputation as being full of obscure gobbledegook.
But, at least in this case, it really is worth learning the idiom, and becoming comfortable with it, because the reductions to just one read and one test and one write really are virtues. It doesn't matter so much in a trivial case like this, but in real programs which are complicated for other reasons, if there's some key piece of functionality that happens in two different places, it's extremely easy to overlook this fact, and to make a change to one of them but forget to make it to the other.
(In fact, this happened to me just last week at work. I was trying to fix a bug in somebody else's code. I finally figured out that when the code did X, it was inadvertently clearing Y. I found the place where it did X, and I added some new code to properly recreate Y. But when I tested my fix, it didn't work! It turned out there were two separate places where the code did X, and I had found and fixed the wrong one.)
Finally, here's an equivalently minimal but unconventional way of writing the loop:
while (1)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
This is kind of like number 1, but it gets rid of the redundant condition in the while loop, and replaces it with the constant 1, which is "true" in C. This will work just fine, too, and it shares the virtue of having one read, one test, and one write. It actually ends up doing exactly the same operations and in exactly the same order as number 3, but by being laid out linearly it may be easier to follow.
The only problem with number 4 is that it's an unconventional, "break in the middle" loop. Personally, I don't have a problem with break-in-the-middle loops, and I find they come up from time to time, but if I wrote one and someone said "Steve, that's ugly, it's not an idiom anyone recognizes, it will confuse people", I'd have to agree.
P.S. I have replaced your calls to fgetc and fputc with the more conventional getc and putc. I'm not sure who told you to use fgetc and fputc, and there are obscure circumstances where you need them, but they're so rare that in my opinion one might as well forget that the "f" variants exist, and always use getc and putc.

return from 1 point in function [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Should a function have only one return statement?
Hello,
gcc 4.4.4 c89
Is it good programming practice to return from 1 point in a function.
I have written a function below. However, I am returning from 2 possible points.
Is this good style?
static int init_data(struct timeout_data_t *timeout_data)
{
if(timeout_data == NULL) {
fprintf(stderr, " [ %s ] [ %d ]\n",
__func__, __LINE__);
return FALSE;
}
/* Assign data */
timeout_data->seconds = 3;
timeout_data->func_ptr = timeout_cb;
return TRUE;
}

If it aids readability, then there is nothing wrong with it.
Personally, I write this kind of code all of the time.

This is an ongoing religious-style debate without an accepted answer. There are many people on both sides of the argument, who feel strongly about it.
I don't think there's anything wrong with it personally, but the best approach is to go with the style guidelines of your team, if they have some (and if not, just ask about it. If anyone recoils in horror, it would be kinder to stick to single-return-point).

I've had managers that lived and died by the 1 return policy for the sake of "readability", even though it's much more readable in some cases without it.
The bottom line is... if the man that signs your paycheck says you're only going to use 1 return, use 1 return. The best way to do this is
type myfunc(params) {
type result = defaultValue;
// actual function here, word for word
// replace "return $1" with "result = $1"
return result;
}
This is a valid way to do things in their book, and will smile at your 1 return policy adherence. Of course, you know using this adds ZERO readability because all you've done is replace "return" (which is syntax highlighted) with "result =" which is not. But you've made your boss happy, which when you break it all down is what development is about anyway, right? :-)

In straight C, I think that error checking/parameter verification at the top of the function with a return (or possibly even multiple return points in the parameter verification) results in reasonably clean code. After that point, though, my opinion is that it is a good idea to have one single return at the bottom of the function. That helps avoid problems with cleanup (e.g., freeing of memory) that might be allocated in the workings of the function.

There's nothing inherently wrong about having more than one exit point, especially when you're returning on errors. Returning immediately usually makes for clearer code than having the whole thing wrapped in an if/else statement and setting some result flag to be returned at the end. (When you see "return result;", you have to look through all of the earlier code to see how and when result gets set. More moving parts == less clarity.)

You've tagged your questions as "C" which makes a difference.
In C you might write code such as
open file
process data
close file
If you put a return in the middle of the process data section then you're likely to skip the essential cleanup so it might be considered bad practice to have multiple return points because it's very easy to mess up.
If it was C++ then its best practice to let destructors handle cleanup so it's not nearly such a potential problem so this advice is somewhat obsolete in c++

As Oded and Andrzej Doyle pointed out there is nothing wrong with it.
They is no such thing as a golden rule when it comes to this.
The first an most important thing you have to keep in mind when writing code is that some one else will have to read it and make sense out of it. Maybe you will have to go about it in a couple of months, and if you have made a mess you will regret it.
Personally I always:
if the code is new used the coding style everybody else is using in the project.
If editing others code used the coding style already implemented there.
Avoid above all code optimizations (the compiler is best at that).
keep it clean and lean.

If your function is small enough (10-15 lines), as it should be :), then it really doesn't matter if you use a single return point or multiple one. Both are equally readable.
Problems start cropping up with badly designed large functions. In such cases both the styles, returning from a single point, and returning from multiple points, further complicates the function, although even in such cases I prefer returning early and returning at multiple points.

It's often the case that you have to check for several conditions etc before you start with the real work, and then you are tempted to do an early return, as in your code. I think this is fine for short methods, but when it gets more complicated I'd suggest to break your code in a "setup and check" method and a "real work" method, both having only one exit. Of course as long as it's readeable, it's fine to have multiple returns (e.g. in a long switch statement).

Failing (and thus returning) early is a very very very good practice. All the code after the checks is free of a lot of potential errors.

Rule of precedence == over =

I am just wondering would it be better to do this:
if((fd = open(filename, O_RDWR)) == -1)
{
fprintf(stderr, "open [ %s ]\n", strerror(errno));
return 1;
}
or this
fd = open(filename, O_RDWR);
if(fd == -1)
{
fprintf(stderr, "open [ %s ]\n", strerror(errno));
return 1;
}
Many thanks for any suggestions,

Yuck, split it up. What do you gain by mashing it all on one line? Let's compare and contrast:
Single-line:
Advantages:
Disadvantages: Hard to read, prone to error. (Consider your first revision.)
Multi-line:
Advantages: Easy to read, less error-prone.
Disadvantages:
I think it's clear. :)
"Sometimes putting it on one line makes more sense, for example: while ((c=getchar())!=EOF)"
That's fine, but that isn't the case here. There are times when not splitting it up makes more sense, but in general, don't.
"It saves more vertical space"
If one line is killing your ability to see the function, you need to 1) buy a monitor with a resolution higher than 640x480, and 2) Write smaller functions.
Really, I've never understood that argument for anything, functions should easily fit on any screen, regardless of a one-line difference.
"Multiple lines make it look complex"
Not really, shoving it on one line is arguably harder to read and more complex looking. Splitting things up makes it simpler to process one bit at a time, one shouldn't assume two lines makes it twice as complex.

Several people have argued in favor of the second. I disagree with them. While there was (apparently) initially a minor issue with = vs. == in the first one, I'd argue that it IS a minor issue.
A much bigger issue is that it's all too common for people (especially if they're in a hurry) to skip over the error checking -- leaving out the if (whatever == -1) completely, usually on the theory that what they're working on is quick, throwaway code and checking for the error isn't really needed. This is a really bad habit; I can practically guarantee that every person reading this has seen real code that skipped error checking like this, even though it really, really should have had it.
In code like this, attempting to open the file and checking for an error in having done so should be inextricably bound together. Putting the two into the same statement reflects the proper intent. Separating the two is just plain wrong -- they should not be separated in any way at any time for any reason. This should be coded as a single operation because it should be a single operation. It should always be thought of and coded as a single operation.
The excuses for doing otherwise are, in my opinion, quite weak. The reality is that anybody who uses C needs to be able to read code that combines an assignment with a conditional test. Just for an obvious example, a loop like while ((ch=getchar()) != EOF) pretty much needs to be written as a combined assignment and test -- attempting to test for EOF separately usually leads to code that just doesn't work correctly, and if you do make it work correctly, the code is substantially more complex.
Likewise, with the problem of - vs. ==. Since I didn't see the defect to start with, I'm not sure how much separating the two would have done to avoid problems, but my immediate guess is that it probably made almost no difference at all. Compilers that will warn you when what was supposed to be a condition contains only an assignment have been around for years (e.g. gcc). In most cases, the symptoms are almost immediately obvious anyway -- in short, the fact that you made a particular typo in one part of this posting but not the other doesn't prove (or honestly even indicate) much of anything about the relative difficulty of the two.
Based on that kind of evidence, I'd apparently believe that "not" is harder to type than "immediately", since I just typed "immediately" without a problem, but had to correct "not" (twice, no less) before it came out right in the previous sentence. I'm pretty sure if we went by how often I mistype it, "the" is the single most difficult word in the English language.

Maybe something where the parentheses make the ordering obvious?
if((fd = open(filename, O_RDWR)) == -1)

In this example, I'll join the chorus saying the second method is better.
The tougher case is when it's in a loop, like:
while ((c=getchar())!=-1)
{
... do something ...
}
versus
while (true)
{
c=getchar();
if (c==-1)
break;
... do something ...
}
In cases like that I prefer to do it on one line because then it makes clear what is controlling the loop, which I think overrides the drawbacks of the complex combination of assignment and testing.

Its a style thing - you're not asking a precedence (not presidence).
Many people will argue the latter example is clearer.

The second is better for readability's sake, but I know I do the first too often. The = operator will take precedence, especially since you have it in quotes, allowing the assigned value to be returned & compared by the == operator.

Except for standard idioms -- ones that are so common that everyone immediately gets what you are trying to do -- I would avoid doing the assignment in the conditional. First, it's more difficult to read. Second, you leave yourself open (at least in weakly typed languages that interpret zero as false and non-zero as true) to creating bugs by using an errant assignment operator in the conditional check.

This is a matter of style and is subjective. They do the same thing. I tend to prefer the later because I find it easier to read, and easier to set breakpoints/examine variables in the debugger.

(-1 == __whatever__)
to minimize typo

The first case is very normal when you're writing an input loop, because the alternative is to have to write the input command twice -- once right before the loop, and once right at the end of the loop.
while ( (ch=getchar()) != -1){
//do something with it
}
I think that the second way is more normal for an if statement, where you don't have the same concern.

What is a good general approach for deciding return values in C?

My program is written in C for Linux, and has many functions with different patterns for return values:
1) one or two return n on success and -1 on failure.
2) some return 0 on success and -1 on failure.
3) some return 1 on success and 0 on failure (I generally spurn using a boolean type).
4) pointers return 0 on failure (I generally spurn using NULL).
My confusion arises over the first three -- functions that return pointers always return 0 on failure, that's easy.
The first option usually involves functions which return a length which may only be positive.
The second option, is usually involved with command line processing functions, but I'm unsure it has correctness, perhaps better values would be EXIT_SUCCESS and EXIT_FAILURE?
The third option is intended for functions which are convenient and natural to be called within conditions, and I usually emulate a boolean type here, by using int values 1 and 0.
Despite this all seeming reasonably sensible, I still find areas where this is not so clear or obvious as to which style to use when I create the function, or which style is in use when I wish to use it.
So how can I add clarity to my approach when deciding upon return types here?

So how can I add clarity to my approach when deciding upon return types here?
Pick one pattern per return type and stick with it, or you'll drive yourself crazy. Model your pattern on the conventions that have long been established for the platform:
If you are making lots of system calls, than any integer-returning function should return -1 on failure.
If you are not making system calls, you are free to follow the convention of the C control structures that nonzero means success and zero means failure. (I don't know why you dislike bool.)
If a function returns a pointer, failure should be indicated by returning NULL.
If a function returns a floating-point number, failure should be indicated by returning a NaN.
If a function returns a full range of signed and unsigned integers, you probably should not be coding success or failure in the return value.
Testing of return values is a bane to C programmers. If failure is rare and you can write a central handler, consider using an exception macro package that can indicate failures using longjmp.

Why don't you use the method used by the C standard library? Oh, wait...

Not an actual answer to your question, but some random comments you might find interesting:
it's normally obvious when to use case (1), but it gets ugly when unsigned types are involved - return (size_t)-1 still works, but it ain't pretty
if you're using C99, there's nothing wrong with using _Bool; imo, it's a lot cleaner than just using an int
I use return NULL instead of return 0 in pointer contexts (peronal preference), but I rarely check for it as I find it more natural to just treat the pointer as a boolean; a common case would look like this:
struct foo *foo = create_foo();
if(!foo) /* handle error */;
I try to avoid case (2); using EXIT_SUCCESS and EXIT_FAILURE might be feasible, but imo this approach only makes sense if there are more than two possible outcomes and you'll have to use an enum anyway
for more complicated programs, it might make sense to implement your own error handling scheme; there are some fairly advanced implementations using setjmp()/longjmp() around, but I prefer something errno-like with different variables for different types of errors

One condition I can think of where your above methodology can fail is a function that can return any value including -1, say a function to add two signed numbers.
In that case testing for -1 will surely be a bad idea.
In case something fails, I would better set a global error condition flag provided by the C standard in form of errno and use that to handle error.
Although, C++ standard library provides exceptions which takes off much hardwork for error handling.

For can't fail deterministic. Yes/no responses using a more specific (bool) return type can help maintain consistency. Going further for higher level interfaces one may want to think about returning or updating a systems specific messaging/result detail structure.
My preference for 0 to always be a success is based on the following ideas:
Zero enables some basic classing for organizing failures by negative vs positive values such as total failure vs conditioned success. I don't recommend this generally as it tends to be a bit too shallow to be useful and might lead to dangerous behaviorial assumptions.
When success is zero one can make a bunch of orthogonal calls and check for group success in a single condition later simply by comparing the return code of the group..
rc = 0;
rc += func1();
rc += func2();
rc += func3();
if (rc == 0)
success!
Most importantly zero from my experience seems to be a consistent indication of success when working with standard libraries and third-party systems.

So how can I add clarity to my approach when deciding upon return types here?
Just the fact that you're thinking about this goes a long way. If you come up with one or two rules - or even more if they make sense (you might need more than one rule - like you mention, you might want to handle returned pointers differently than other things) I think you'll be better off than many shops.
I personally like to have 0 returned to signal failure and non-zero to indicate success, but I don't have a strong need to hold to this. I can understand the philosophy that might want to reverse that sense so that you can return different reasons for the failure.
The most important thing is to have guidelines that get followed. Even nicer is to have guidelines that have a documented rationale (I believe that with rationales people are more likely to follow the guidelines). Like I said, just the fact that you're thinking about these things puts you ahead of many others.

That is a matter of preference, but what I have noticed is the inconsistency. Consider this using a pre C99 compiler
#define SUCCESS 1
#define ERROR 0
then any function that returns an int, return either one or the other to minimize confusion and stick to it religiously. Again, depending on, and taking into account of the development team, stick to their standard.
In pre C99 compilers, an int of zero is false, and anything greater than zero is to be true. That is dependant on what standard is your compiler, if it's C99, use the stdbool's _Bool type.
The big advantage of C is you can use your personal style, but where team effort is required, stick to the team's standard that is laid out and follow it religiously, even after you leave that job, another programmer will be thankful of you.
And keep consistent.
Hope this helps,
Best regards,
Tom.

Much of the C standard library uses the strategy to only return true (or 1) on success and false (or 0) on failure, and store the result in a passed in location. More specific error codes than "it failed" is stored in the special variable errno.
Something like this int add(int* result, int a, int b) which stores a+b in *result and returns 1 (or returns 0 and sets errno to a suitable value if e.g. a+b happens to be larger than maxint).