I am learning C currently and wanted to know whether the following two pieces of code perform differently, or if it's just a style thing.
Looking at some sources they the have the following code:
...
FILE * pFile;
pFile = fopen ("myfile.txt","r");
if (pFile == NULL)
{ some code }
...
While my professor has the following code in his notes:
...
FILE * pFile
if ((pFile = fopen("myfile.txt","r")) == NULL)
{ some code }
...
Just wanted to know if this is merely a style preference by different programmers or if there is an advantage to putting the return/set line inside the if statmeent.
There is no difference. More experienced programmers sometimes go with the second form, just to save a line, but they are essentially identical. The second tends to be a little more "UNIX-y", where most function calls are checked for error (as opposed to success) before continuing.
They're identical, since (pFile = fopen("myfile.txt", "r")) returns pFile, but I would personally prefer the first since it's more explicit.
These two variants are equal. It doesn't affect performance. However, I think the first variant is better because it makes things more clear.
Both programs are equivalent.
Some people favor the first style saying it is more readable and some people favor the second style saying it is more compact.
For information, note that in some coding guidelines (MISRA being one) the second style is forbidden. MISRA forbids the use of the assignment operator in the controlling expression of the if statement.
There is no difference in performance, but the second is clearly preferable.
The first separates attempting to open the file from testing for success in opening the file.
The second makes opening the file and testing for success a single operation, which is exactly how you should not only code it, but also how you should think about it. You shouldn't think of them as two separate operations at all. The operation of opening a file is not complete until/unless you've checked whether it opened correctly or not.
Treating opening and testing as two separate operations is lazy coding that leads to sloppy thinking. Don't do it.
As said here the two segments are clearly identical. I would however prefer the first as it tends to avoid the confusion between the assignment operator = and the equality operator ==. Consider a situation in which there's a function foo(arg) returning an int. You would write something like:
int y;
if ((y = foo(x)) == 0) {
... some code ...
}
Now, let's say you confuse the assignment operator with equality (very common in if expressions BTW):
int y;
if ((y == foo(x)) == 0) {
... some code ...
}
Since the type of the expression (y == foo(x)) is int, the above is considered a legitimate C code by the compiler. This would clearly produce a bug in your code.
Now, lets consider the first option:
int y;
y = foo(x);
if (y == 0) {
... some code ...
}
Clearly, now you are less likely to confuse the assignment with the equality. Furthermore, even if you do write y == foo(x); as a statement the compiler will issue a warning.
While the existing answers are very good, I've got a couple things to add about how to approach the question of whether something in C is a performance issue.
First, a quick way to check is to compile both versions of the code with gcc -O3 and compare the generated .o files. If they're identical, then of course there cannot be any performance difference (at least not with the current compiler/version you're using).
With that said, a more conceptual approach to the question is to ask yourself 2 questions:
Do the two pieces of code define exactly the same behavior for all possible valid input variable values, or only the same behavior (or even just similar behavior) for the inputs you expect?
If they define exactly the same behavior, do you think it's easy for a compiler to see this?
If so, there "shouldn't" be a performance difference, because the compiler "should" compile them both to whatever it thinks is the most efficient way to achieve the behavior described. Of course sometimes compilers can be pretty stupid, so if it really matters, you may want to check.
In your case, both versions of the code define the exact same behavior, and I think you'd have a hard time finding a compiler that compiled them differently, except perhaps with optimization completely disabled.
Related
I'm trying to fix the compliance of my code to misra C. During the static analysis, I had this violation:
Rule 12.1: Extra parentheses recommended. A conditional operation is
the operand of another conditional operator.
The code is:
if (CHANNEL_STATE_GET(hPer, channel) != CHANNEL_STATE_READY)
{
retCode = ERROR;
}
where CHANNEL_STATE_GET is a macro as follow:
#define CHANNEL_STATE_GET(__HANDLE__, __CHANNEL__)\
(((__CHANNEL__) == CHANNEL_1) ? (__HANDLE__)->ChannelState[0] :\
((__CHANNEL__) == CHANNEL_2) ? (__HANDLE__)->ChannelState[1] :\
((__CHANNEL__) == CHANNEL_3) ? (__HANDLE__)->ChannelState[2] :\
((__CHANNEL__) == CHANNEL_4) ? (__HANDLE__)->ChannelState[3] :\
((__CHANNEL__) == CHANNEL_5) ? (__HANDLE__)->ChannelState[4] :\
(__HANDLE__)->ChannelState[5])
Do you have any idea to solve this violation?
BR,
Vincenzo
There's several concerns here, as far as MISRA C is concerned:
There's various rules saying that macros and complex expressions should be surrounded by parenthesis, and that code shouldn't rely on the C programmer knowing every single operator precedence rule. You can solve that by throwing more parenthesis on the expression, but that's just the top of the iceberg.
The ?: operator is considered a "composite operator" and so expressions containing it are considered "composite expressions" and come with a bunch of extra rules 10.6, 10.7 and 10.8. Meaning that there is a lot of rules regarding when and how this macro may be mixed with other expressions - the main concerns are implicit, accidental type conversions.
The use of function-like macros should be avoided in the first place.
Identifiers beginning with multiple underscores aren't allowed by the C language since it reserves those for the implementation (C17 7.1.3).
The easier and recommended fix is just to forget about that macro, since it will just cause massive MISRA compliance headache. Also at a glance, it looks like very inefficient code with nested branches. My suggested fix:
In case hPer happens to be a pointer to pointer (seems like it), then dereference it and store the result in a plain, temporary pointer variable. Don't drag the nasty pointer to pointer syntax around across the whole function/macro.
Replace this whole macro with a (inline) function or a plain array table look-up, depending on how well you've sanitized the channel index.
Ensure that CHANNEL_1 to CHANNEL_5 are adjacent integers from 0 to 4. If they aren't, use some other constant or look-up in between.
A MISRA compliant re-design might look like this:
typedef enum
{
CHANNEL_1,
CHANNEL_2,
CHANNEL_3,
CHANNEL_4,
CHANNEL_5
} channel_t;
// state_t is assumed to be an enum too
state_t CHANNEL_STATE_GET (const HANDLE* handle, channel_t channel)
{
if((uint32_t)channel > (uint32_t)CHANNEL_5)
{
/* error handling here */
}
uint32_t index = (uint32_t)channel;
return handle[index];
}
...
if (CHANNEL_STATE_GET(*hPer, channel) != CHANNEL_STATE_READY)
If you can trust the value of channel then you don't even need the function, just do a table look-up. Also note that MISRA C encourages "handle" in this case to be an opaque type, but that's a chapter of its own.
Note that this code is also assuming that HANDLE isn't a pointer hidden behind a typedef as in Windows API etc - if so then that needs to be fixed as well.
Note (as more or less implied by Lundins comment....), I answer more about how to approach MISRA findings (and those of a few other analysis tools I suffered from ....).
I would first try to get a better angle on what the finding is actually describing. And with a nested structure like shown, that takes some re-looking. So ...
I would apply indentation, just to make life easier while editing and then, well, add some more () in inviting places, e.g. in this case so as to enclose each x?y:z into one pair.
#define CHANNEL_STATE_GET(__HANDLE__, __CHANNEL__)\
( ((__CHANNEL__) == CHANNEL_1) ? (__HANDLE__)->ChannelState[0] :\
( ((__CHANNEL__) == CHANNEL_2) ? (__HANDLE__)->ChannelState[1] :\
( ((__CHANNEL__) == CHANNEL_3) ? (__HANDLE__)->ChannelState[2] :\
( ((__CHANNEL__) == CHANNEL_4) ? (__HANDLE__)->ChannelState[3] :\
(((__CHANNEL__) == CHANNEL_5) ? (__HANDLE__)->ChannelState[4] :\
(__HANDLE__)->ChannelState[5] \
) \
) \
) \
) \
)
This is to address what the quoted finding is about.
I would not feel bad about sprinkling a few more around e.g. each CHANNEL_N.
(I admit that I did not test my code against a MISRA checker. I try to provide an approach. I hope this fixes the mentioned finding, possibly replacing it with another one.... MISRA in my experience is good at that.... I do not even expect this to solve all findings.)
When trying to fix some seriously odd code like this, it's often a good idea to take one or two big steps backwards.
We know that hPer refers to an array. We have some troublesome code that is indexing into that array and pulling out one of the channel states. But this code is, frankly, pretty awful. Even if the MISRA checker weren't complaining about it, any time you've got five nested ?: operators, performing a cumbersome by-hand emulation of what ought to be a simple array lookup, it's a sure sign that something isn't right, and that there's probably a better way to do it. So what might that better way be?
One way to approach that question is to ask, How is the ChannelState array filled in? And is there any other code that also fetches out of it?
You've only asked us about this one line that your MISRA checker is complaining about. That suggests that the code that fills in the ChannelState array, and any other code that fetches out of it, is not drawing complaints. Perhaps that other code accesses the ChannelState array in some different, hopefully better way. Perhaps the underlying problem is that the programmer who wrote this CHANNEL_STATE_GET macro was unaware of that other code, had not been properly educated on this program's coding conventions and available utility routines. Perhaps it's perfectly acceptable to directly index a ChannelState array using a channel value. Or perhaps there's already something like the map_channel_index function which I suggested in my other answer.
So, do yourself a favor: spend a few minutes seeking out some other code that accesses the ChannelState array. You might learn something very interesting.
Other comments and answers are suggesting replacing the cumbersome CHANNEL_STATE_GET macro with a much simpler array lookup, and I strongly agree with that recommendation.
It's possible, though, that the definitions of CHANNEL_1 through CHANNEL_5 are not under your control, such that you can't guarantee that they're consecutive small integers as would be required. In that case, I recommend writing a small function whose sole job is to map a channel_t to an array index. The most obvious way to do this is with a switch statement:
unsigned int map_channel_index(channel_t channel)
{
switch(channel) {
case CHANNEL_1: return 0;
case CHANNEL_2: return 1;
case CHANNEL_3: return 2;
case CHANNEL_4: return 3;
case CHANNEL_5: return 4;
default: return 5;
}
}
Then you can define the much simpler
#define CHANNEL_STATE_GET(handle, channel) \
((handle)->ChannelState[map_channel_index(channel)])
Or, you can get rid of CHANNEL_STATE_GET entirely by replacing
if(CHANNEL_STATE_GET(hPer, channel) != CHANNEL_STATE_READY)
with
if((*hPer)->ChannelState[map_channel_index(channel)] != CHANNEL_STATE_READY)
Hello everybody: I've an expression like this:
if (a == 1) {
printf("hello\n");
}
Is there a way to do something like that?
a== 1 && printf("hello\n");
It's called short-circuit expression, but I don't know anything about it. Does it exist in C? How to do it.
As already pointed out in comments, doing a == 1 && printf("hallo\n"); will indeed work as I believe you intended, i.e. "hallo" will only be printed if the condition is true, if a is 1 in this case. The short answer is yes, short-circuit expressions do exist in C.
This can be easily determined by compiling and running the code, which is the recommended way if you're just exploring how the language works. However, if the question is, "is it good practice to use it to decide when to print?", many people would say no. It's best to stick to more readable, and therefore more maintainable code, with the if statement in your example.
A word of warning here:
it works as long as the expression to the right hand side of && returns sth convertible to boolean, e.g. printf in this case returns an int. It is valid C code, true, but seems a code smell to me and many people would complain during the review.
Note also, that sth like this:
void foo(char*)
{
//whatever
}
int main(void)
{
int a = 1;
a == 1 && foo("abc");
}
is not going to to work and you'll have to use some tricks, e.g. with comma operator:
a == 1 && (foo("abc"),1);
Thus, for the sake of maintainability, you might want to use some other construct, e.g. the ternary operator:
printf(a==1?"Hello\n":"");
which is not exactly equivalent, but might (or might not) work better in your particular case.
EDIT:
as per comment below:
It is true, that passing conditional input to printf's format string can be considered a bad practice, especially in more complicated cases as rids one of compiler diagnostics related to printf's input params.
#chqrlie suggested just using a one-liner if: if(a==1)printf("hello\n"); which is fine as long as coding conventions allow it. Sometimes they don't.
If so, the somewhat cleaner ternary version is this: printf("%s",a==1? "Hello\n":"");. Please note however this is all matter of coding conventions/programmer's and reviewer's taste/linter settings/insert-your-source-of-good-practices-here. Thus, one can most likely skin this cat in way more ways, and the list is definitely not exhaustive.
Is it bad or good practice or maybe undefined behavior to re-assign function parameter inside function?
Let me explain what I'm trying to do with an example, here the function:
void
gkUpdateTransforms(GkNode *node /* other params */) {
GkNode *nodei;
if (!(nodei = node->chld))
return;
do {
/* do job */
nodei = nodei->next;
} while (nodei);
}
Alternative:
void
gkUpdateTransforms2(GkNode *node /* other params */) {
/* node parameter is only used here to get chld, not anywhere else */
if (!(node = node->chld))
return;
do {
/* do job */
node = node->next;
} while (node);
}
I checked assembly output and it seems same, we don't need to declare a variable in second one. You may ask what if parameter type changed but same condition would be same for first one, because it also need to be updated.
EDIT: Parameters are pass-by-value, and my intention is not edit pointer itself
EDIT2: What about recursive functions? What would happen if gkUpdateTransforms2 was recursive? I'm confused because function will call itself but I think in every call, parameters will be different stack
I have no idea why you think this would be undefined behavior - it is not. Mostly it is a matter of coding style, there's no obvious right or wrong.
Generally, it is good practice to regard parameters as immutable objects. It is useful to preserve an untouched copy of the input to the function. For that reason, it may be a good idea to use a local variable which is just a copy of the parameter. As you can see, this does not affect performance the slightest - the compiler will optimize the code.
However, it is not a big deal if you write to the parameters either. This is common practice too. Calling it bad practice to do so would be very pedantic.
Some pedantic coding styles make all function parameters const if they shouldn't be modified, but I personally think that's just obfuscation, which makes the code harder to read. In your case such pedantic style would be void gkUpdateTransforms(GkNode*const node). Not to be confused with const correctness, which is an universally good thing and not just a style matter.
However, there is something in your code which is definitely considered bad practice, and that is assignment inside conditions. Avoid this whenever possible, it is dangerous and makes the code harder to read. Most often there is no benefit.
The danger of mixing up = and == was noted early on in the history of C. To counter this, in the 1980s people came up with brain-damaged things like the "yoda conditions". Then around 1989 came Borland Turbo C which had a fancy warning feature "possible incorrect assignment". That was the death of the Yoda conditions, and compilers since then have warned against assignment in conditions.
Make sure that your current compiler gives a warning for this. That is, make sure not to use a worse compiler than Borland Turbo from 1989. Yes, there are worse compilers on the market.
(gcc gives "warning: suggest parentheses around assignment used as truth value")
I would write the code as
void gkUpdateTransforms(GkNode* node /* other params */)
{
if(node == NULL)
{
return ;
}
for(GkNode* i=node->chld; i!=NULL; i=i->next;)
{
/* do job */
}
}
This is mostly stylistic changes to make the code more readable. It does not improve performance much.
IMHO it is not exactly "bad" practice but it is worthwile to question oneself if there isn't a better way. About your analyzing the assembler output: it may serve as an interesting and educational look behind the curtain but you are ill advised to use this as an justification for optimization or worse, laziness in the source code. The next compiler or the next architecture may just render your musings completely invalid - my recommendation is to stay with Knuth here: "Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.".
In your code I think the decision is 50:50 with no clear winner. I would deem the node-iterator a concept of its own, justifying a separate programming construct (which in our case is just a variable) but then again the function is so simple that we don't win much in terms of clarity for the next programmer looking at your code, so we can very well live with the second version. If your function starts to mutate and grow over time, this premise may become invalid and we were better off the first version.
That said, I would code the first version like this:
void
gkUpdateTransforms(GkNode *node /* other params */) {
for (GkNode *nodei = node->chld; nodei != NULL; nodei = nodei->next) {
/* do job */
}
}
This is well defined and a perfectly good way to implement this behaviour.
The reason you might see it as an issue is the common mistake of doing the following:
int func(object a) {
modify a // only modifying copy, but user expects a to be modified
But in your case, you expect to make a copy of the pointer.
As long as it's passed by value, it can be safely treated as any other local variable. Not a bad practice in this scenario, and not undefined behaviour either.
Having been writing Java code for many years, I was amazed when I saw this C++ statement:
int a,b;
int c = (a=1, b=a+2, b*3);
My question is: Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
I think the compiler will see it the same as the following:
int a=1, b=a+2;
int c = b*3;
(What's the offical name for this? I assume it's a standard C/C++ syntax.)
It's the comma operator, used twice. You are correct about the result, and I don't see much point in using it that way.
Looks like an obscure use of a , (comma) operator.
It's not a representative way of doing things in C++.
The only "good-style" use for the comma operator might be in a for statement that has multiple loop variables, used something like this:
// Copy from source buffer to destination buffer until we see a zero
for (char *src = source, *dst = dest; *src != 0; ++src, ++dst) {
*dst = *src;
}
I put "good-style" in scare quotes because there is almost always a better way than to use the comma operator.
Another context where I've seen this used is with the ternary operator, when you want to have multiple side effects, e.g.,
bool didStuff = DoWeNeedToDoStuff() ? (Foo(), Bar(), Baz(), true) : false;
Again, there are better ways to express this kind of thing. These idioms are holdovers from the days when we could only see 24 lines of text on our monitors, and squeezing a lot of stuff into each line had some practical importance.
Dunno its name, but it seems to be missing from the Job Security Coding Guidelines!
Seriously: C++ allows you to a do a lot of things in many contexts, even when they are not necessarily sound. With great power comes great responsibility...
This is called 'obfuscated C'. It is legal, but intended to confuse the reader. And it seems to have worked. Unless you're trying to be obscure it's best avoided.
Hotei
Your sample code use two not very well known by beginners (but not really hidden either) features of C expressions:
the comma operator : a normal binary operator whose role is to return the last of it's two operands. If operands are expression they are evaluated from left to right.
assignment as an operator that returns a value. C assignment is not a statement as in other languages, and returns the value that has been assigned.
Most use cases of both these feature involve some form of obfuscation. But there is some legitimate ones. The point is that you can use them anywhere you can provide an expression : inside an if or a while conditional, in a for loop iteration block, in function call parameters (is using coma you must use parenthesis to avoid confusing with actual function parameters), in macro parameter, etc.
The most usual use of comma is probably in loop control, when you want to change two variables at once, or store some value before performing loop test, or loop iteration.
For example a reverse function can be written as below, thanks to comma operator:
void reverse(int * d, int len){
int i, j;
for (i = 0, j = len - 1 ; i < j ; i++, j--){
SWAP(d[i], d[j]);
}
}
Another legitimate (not obfuscated, really) use of coma operator I have in mind is a DEBUG macro I found in some project defined as:
#ifdef defined(DEBUGMODE)
#define DEBUG(x) printf x
#else
#define DEBUG(x) x
#endif
You use it like:
DEBUG(("my debug message with some value=%d\n", d));
If DEBUGMODE is on then you'll get a printf, if not the wrapper function will not be called but the expression between parenthesis is still valid C. The point is that any side effect of printing code will apply both in release code and debug code, like those introduced by:
DEBUG(("my debug message with some value=%d\n", d++));
With the above macro d will always be incremented regardless of debug or release mode.
There is probably some other rare cases where comma and assignment values are useful and code is easier to write when you use them.
I agree that assignment operator is a great source of errors because it can easily be confused with == in a conditional.
I agree that as comma is also used with a different meaning in other contexts (function calls, initialisation lists, declaration lists) it was not a very good choice for an operator. But basically it's not worse than using < and > for template parameters in C++ and it exists in C from much older days.
Its strictly coding style and won't make any difference in your program. Especially since any decent C++ compiler will optimize it to
int a=1;
int b=3;
int c=9;
The math won't even be performed during assignment at runtime. (and some of the variables may even be eliminated entirely).
As to choice of coding style, I prefer the second example. Most of the time, less nesting is better, and you won't need the extra parenthesis. Since the use of commas exhibited will be known to virtually all C++ programmers, you have some choice of style. Otherwise, I would say put each assignment on its own line.
Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
It's both a choice of coding style and it has a real benefit.
It's clearly a different coding style as compared to your equivalent example.
The benefit is that I already know I would never want to employ the person who wrote it, not as a programmer anyway.
A use case: Bob comes to me with a piece of code containing that line. I have him transferred to marketing.
You have found a hideous abuse of the comma operator written by a programmer who probably wishes that C++ had multiple assignment. It doesn't. I'm reminded of the old saw that you can write FORTRAN in any language. Evidently you can try to write Dijkstra's language of guarded commands in C++.
To answer your question, it is purely a matter of (bad) style, and the compiler doesn't careāthe compiler will generate exactly the same code as from something a C++ programmer would consider sane and sensible.
You can see this for yourself if you make two little example functions and compile both with the -S option.
Sometimes I have to write code that alternates between doing things and checking for error conditions (e.g., call a library function, check its return value, keep going). This often leads to long runs where the actual work is happening in the conditions of if statements, like
if(! (data = (big_struct *) malloc(sizeof(*data)))){
//report allocation error
} else if(init_big_struct(data)){
//handle initialization error
} else ...
How do you guys write this kind of code? I've checked a few style guides, but they seem more concerned with variable naming and whitespace.
Links to style guides welcome.
Edit: in case it's not clear, I'm dissatisfied with the legibility of this style and looking for something better.
Though it pains me to say it, this might be a case for the never-popular goto. Here's one link I found on on the subject: http://eli.thegreenplace.net/2009/04/27/using-goto-for-error-handling-in-c/
I usually write that code in this way:
data = (big_struct *) malloc(sizeof(*data));
if(!data){
//report allocation error
return ...;
}
err = init_big_struct(data);
if(err){
//handle initialization error
return ...;
}
...
In this way I avoid calling functions inside if and the debug is easier because you can check the return values.
Dont use assert in production code.
In debug mode, assert should never be used for something that can actually happen (like malloc returning NULL), rather it should be used in impossible cases (like array index is out of bounds in C)
Read this post for more.
One method which I used to great effect is the one used by W. Richard Stevens in Unix Network Programming (code is downloadable here. For common functions which he expects to succeed all the time, and has no recourse for a failure, he wraps them, using a capital letter (code compressed vertically):
void * Malloc(size_t size) {
void *ptr;
if ( (ptr = malloc(size)) == NULL)
err_sys("malloc error");
return(ptr);
}
err_sys here displays the error and then performs an exit(1). This way you can just call Malloc and know that it will error out if there is a problem.
UNP continues to be the only book I've where I think the author has code which checks the return values of all the functions which it's possible to fail. Every other book says "you should check the return values, but we'll leave that for you to do later".
I tend to
Delegate error checking to wrapper functions (like Stevens)
On error, simulate exceptions using longjmp. (I actually use Dave Hanson's C Interfaces and Implementations to simulate exceptions.)
Another option is to use Don Knuth's literate programming to manage the error-handling code, or some other kind of preprocessor. This option is available only if you get to set the rules for your shop :-)
The only grouping property of code like this is that there simply is an externally imposed sequence that it has to follow. This is why you put these allocations into one function, but this is a very weak commonality. Why some people recommend to abandon the scope advantages of nested if's is beyond my understanding. You are effectively trying to put lipstick on a pig (no insult intended) - the code's nature will never yield anything clean, the best you can do is to use the compilers help to catch (maintenance) errors. Stick with the if's IMHO.
PS: if I haven't convinced you yet: what will the goto-solution look like if you have to take ternary decisions? The if's will get uglier for sure, but the goto's???