I have several pointer/integer variables and I want to check if any of them is 0. Right now I compare each to 0 in a large if statement that would short circuit once it hits one that is 0. I was wondering if there is any more clever or faster way of accomplishing this.
It doesn't really matter. Even if you'd stack all pointers up and loop over this array or if you or-ed all values.. you'd still have to do that one after another. And if you have something like this if( a != 0 && b != 0 && .. && z != null) the compiler will convert that to as many instructions as it will need in all other cases.
The only thing you might could save using an array which you e.g. you loop over is maybe memory at some point but I don't think this is what you were looking for.
No there is not. Think about it: to really be sure that not a single one of your values is zero, you absolutely have to look at each an every one of them. As you correctly noted, it is possible to short-circuit, one a zero-value has been found. I would recommend something similar to this:
int has_null = -1;
for(int i=0; i < null_list_len && has_null &= null_list[i]; ++i)
;
if(has_null)
//do stuff
You can improve the run time, if you have more assumptions about the values you are testing. If, for example you knew, that the null_list array is sorted, you only have to check wether the very first entry is zero, as a non-zero entry would imply that all other values are also greater than zero.
Well, you could code whatever it is that sets the vars to zero to ensure that a common boolean is set to 'true'.
Checking would then be a matter of testing one boolean, no matter how many vars there are. If the bool is true, then you can do a sequential check, much as you are doing now.
That may, or may not, be possible, faster, or more efficient, overall, (or not).
Related
Assuming char *str;, to test the empty string following approaches works
/* Approrach 1 */
if (str && strlen(str) > 0) {
...
}
/* Approach 2 */
if (str && *str) {
...
}
What is the most preferable to use? I assume second will be faster as it does not have to iterate over the buffer to get the length. Also any downfalls of using the second?
I'll give a third option that I find would be better:
if (str && str[0]) {
// ...
}
The strlen method isn't ideal since it can iterate over a non-zero-length string. The compiler may optimize out that call entirely (as has been pointed out), but it won't on every compiler (and I assume the -ffreestanding option would disable this optimization), and it at least makes it look like more work needs to happen.
However, I consider the [0] to have much clearer intent than a *. I generally recommend using * when dereferencing a pointer to a single object, and using [0] when that pointer is to the first element of an array.
To be extra clear, you can do:
if (str && str[0] != '\0') {
// ...
}
but that starts tipping the special-to-alphanum-characters ratio towards hard-to-read.
Approach 2, because the first attempt will iterate over the string if it's not empty. And no, there are no downsides to approach 2.
It's unlikely that the compiler would generate different code if you have optimization enabled. But if performance REALLY is an issue it's probably better to use the second approach. I say probably, because it's not 100% certain. Who knows what the optimizer will do? But there is a risk that the first approach will iterate over the whole string if it's not empty.
When it comes to readability, I'd say that the first is slightly more readable. However, using *str to test for an empty string is very idiomatic in C. Any seasoned C coder would instantly understand what it means. So TBH, the readability issue is mostly in case someone who is not a C programmer will read the code. If someone does not understand what if (str && *str) does, then you don't want them to modify the code either. ;)
If there is a coding standard for the code base you're working on, stick to that. If there's not, pick the one you like most. It does not really matter.
What is the most preferable to use?
Obviously number one, as that is most readable.
I would ignore performance issues. Both CLANG and GCC will generate the same code for "-O3" options.
See godbolt
The first approach expresses programmer's intentions a bit more explicitly.
Edit: If you fundamentally disagree with the Fedora guide here, please explain why this approach would be worse in an objective way than classic loops. As far as I know even the CERT standard doesn't make any statement on using index variables over pointers.
I'm currently reading the Fedora Defensive Coding Guide and it suggests the following:
Always keep track of the size of the array you are working with.
Often, code is more obviously correct when you keep a pointer past the
last element of the array, and calculate the number of remaining
elements by substracting the current position from that pointer. The
alternative, updating a separate variable every time when the position
is advanced, is usually less obviously correct.
This means for a given array
int numbers[] = {1, 2, 3, 4, 5};
I should not use the classic
size_t length = 5;
for (size_t i = 0; i < length; ++i) {
printf("%d ", numbers[i]);
}
but instead this:
int *end = numbers + 5;
for (int *start = numbers; start < end; ++start) {
printf("%d ", *start);
}
or this:
int *start = numbers;
int *end = numbers + 5;
while (start < end) {
printf("%d ", *start++);
}
Is my understanding the recommendation correct?
Is my implementation correct?
Which of the last 2 is safer?
Your understanding of what the text recommends is correct, as is your implementation. But regarding the basis of the recommendation, I think you are confusing safe with correct.
It's not that using a pointer is safer than using an index. The argument is that, in reasoning about the code, it is easier to decide that the logic is correct when using pointers. Safety is about failure modes: what happens if the code is incorrect (references a location outside the array). Correctness is more fundamental: that the algorithm provably does what it sets out to do. We might say that correct code doesn't need safety.
The recommendation might have been influenced by Andrew Koenig's series in Dr. Dobbs a couple of years ago. How C Makes It Hard To Check Array Bounds. Koenig says,
In addition to being faster in many cases, pointers have another big advantage over arrays: A pointer to an array element is a single value that is enough to identify that element uniquely. [...] Without pointers, we need three parameters to identify the range: the array and two indices. By using pointers, we can get by with only two parameters.
In C, referencing a location outside the array, whether via pointer or index, is equally unsafe. The compiler will not catch you out (absent use of extensions to the standard). Koenig is arguing that with fewer balls in the air, you have a better shot at getting the logic right.
The more complicated the construction, the more obvious it is that he's right. If you want a better illustration of the difference, write strcat(3) both ways. Using indexes, you have two names and two indexes inside the loop. It's possible to use the index for one with the name for the other. Using pointers, that's impossible. All you have are two pointers.
Is my understanding the recommendation correct?
Is my implementation correct?
Yes, so it seems.
The method for(type_t start = &array; start != end; start++) is sometimes used when you have arrays of more complex items. It is mostly a matter of style.
This style is sometimes used when you already have the start and end pointers available for some reason. Or in cases where you aren't really interested in the size, but just want to repeatedly compare against the end of the array. For example, suppose you have a ring buffer ADT with a start pointer and an end pointer and want to iterate through all items.
This way of doing loops is actually the very reason why C explicitly allows pointers to point 1 item out-of-bounds of an array, you can set an end pointer to one item past the array without invoking undefined behavior (as long as that item isn't de-referenced).
(It is the very same method as used by STL iterators in C++, although there's more of a rationale in C++, since it has operator overload. For example iterator++ in C++ doesn't necessarily give an item adjacently allocated in the next memory cell. For example, iterators could be used for iterating through a linked list ADT, where the ++ would translate to node->next behind the lines.)
However, to claim that this form is always the preferred one is just subjective nonsense. Particularly when you have an array of integers and know the size. Your first example is the most readable form of a loop in C and therefore always preferred whenever possible.
On some compilers/systems, the first form could also give faster code than the second form. Pointer arithmetic might give slower code on some systems. (And I suppose that the first form might give faster data cache access on some systems, though I'd have to verify that assumption with some compiler guru.)
Which of the last 2 is safer?
Neither form is safer than the other. To claim otherwise would be subjective opinions. The statement "...is usually less obviously correct" is nonsense.
Which style to pick vary on case-to-case basis.
Overall, those "Fedora" guidelines you link seem to contain lots of questionable code, questionable rules and blatant opinions. Seems more like someone wanted to show off various C tricks than a serious attempt to write a coding standard. Overall, it smells like the "Linux kernel guidelines", which I would not recommended to read either.
If you want a serious coding standard for/by professionals, use CERT-C or MISRA-C.
When I copy the content of a file to another in C at the end of the output file I have this character ÿ. I understand thanks to this forum that it is the EOF indicator but I don't understand what to do in order to get rid of it in the output file.
This is my code:
second_file = fopen(argv[2], "w+");
while (curr_char != EOF)
{
curr_char = fgetc(original_file);
fputc(curr_char, second_file);
}
printf("Your file has been successfully copy\n");
fclose(second_file);
fclose(original_file);
For each character you read, you have two things to do:
Check to see if it's EOF.
If not, write it to the output.
Your problem is you're doing these two things in the wrong order.
There are potentially several different ways of solving this. Which one you pick depends on how much you care about your program looking good, as opposed to merely working.
One. Starting with the code you wrote, we could change it to:
while (curr_char != EOF)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
Here, we explicitly test to see if the character is EOF, immediately after reading it, before writing it. If it's EOF, we break out of the loop early. This will work, but it's ugly: there are two different places where we test for EOF, and one of them never "fires". (Also, as a commentator reminded me, there's the problem that the first time through the loop, we're testing curr_char before we've ever set it.)
Two. You could rearrange it like this:
curr_char = getc(original_file);
while (curr_char != EOF)
{
putc(curr_char, second_file);
curr_char = getc(original_file);
}
Here, we read an initial character, and as long as it's not EOF, we write it and read another. This will work just fine, but but it's still a little bit ugly, because this time there are two different places where we read the character.
Three. You could rearrange it like this:
while ((curr_char = getc(original_file)) != EOF)
{
putc(curr_char, second_file);
}
This is the conventional way of writing a character-copying loop in C. The call to getc and the assignment to curr_char are buried inside of the controlling expression of the while loop. It depends on the fact that in C, an assignment expression has a value just like any other expression. That is, the value of the expression a = b is whatever value we just assigned to a (that is, b's value). So the value of the expression curr_char = getc(original_file) is the character we just read. So when we say while ((curr_char = getc(original_file)) != EOF), what we're actually saying is, "Call getc, assign the result to curr_char, and if it's not equal to EOF, take another trip around the loop."
(If you're still having trouble seeing this, I've written other explanations in these notes and this writeup.)
This code is both good and bad. It's good because we've got exactly one place we read characters, one place we test characters, and one place we write characters. But it's a little bit bad because, let's admit it, it's somewhat cryptic at first. It's hard to think about that assignment-buried-inside-the-while-condition. It's code like this that gives C a reputation as being full of obscure gobbledegook.
But, at least in this case, it really is worth learning the idiom, and becoming comfortable with it, because the reductions to just one read and one test and one write really are virtues. It doesn't matter so much in a trivial case like this, but in real programs which are complicated for other reasons, if there's some key piece of functionality that happens in two different places, it's extremely easy to overlook this fact, and to make a change to one of them but forget to make it to the other.
(In fact, this happened to me just last week at work. I was trying to fix a bug in somebody else's code. I finally figured out that when the code did X, it was inadvertently clearing Y. I found the place where it did X, and I added some new code to properly recreate Y. But when I tested my fix, it didn't work! It turned out there were two separate places where the code did X, and I had found and fixed the wrong one.)
Finally, here's an equivalently minimal but unconventional way of writing the loop:
while (1)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
This is kind of like number 1, but it gets rid of the redundant condition in the while loop, and replaces it with the constant 1, which is "true" in C. This will work just fine, too, and it shares the virtue of having one read, one test, and one write. It actually ends up doing exactly the same operations and in exactly the same order as number 3, but by being laid out linearly it may be easier to follow.
The only problem with number 4 is that it's an unconventional, "break in the middle" loop. Personally, I don't have a problem with break-in-the-middle loops, and I find they come up from time to time, but if I wrote one and someone said "Steve, that's ugly, it's not an idiom anyone recognizes, it will confuse people", I'd have to agree.
P.S. I have replaced your calls to fgetc and fputc with the more conventional getc and putc. I'm not sure who told you to use fgetc and fputc, and there are obscure circumstances where you need them, but they're so rare that in my opinion one might as well forget that the "f" variants exist, and always use getc and putc.
I'm trying to have an embedded software development point of view, and I'd like to ask which one is better to go with, and what are the possible advantages and disadvantages?
bool funct(){
bool retVal = 0;
//do something
return retVal;
}
//First Choice
if(funct()){
//do something
}
//Second Choice
bool retVal = funct();
if(retVal)
{
//do something
}
Either is probably OK in this example, however the second has a slight advantage when debugging in that when stepping the code you will know whether the condition is true before the branch is taken and can coerce the variable to a different value if you want to test the alternate path, and being able to see the result of a call after the event is useful in any case during debugging.
In more complex expressions the approach may be more important, for example in:
if( x() || y() ) ...
if x() returns true, then y() will not be evaluated, which may or may not be desirable if y() has side effects, so the semantics of that are not the same as:
bool xx = x() ;
bool yy = y() ;
if( xx || yy ) ...
Using explicit assignment allows the required semantics to be clearly expressed.
//First Choise
if(funct()){
//do something
}
This is totally fine as you check the return value of function to take the decision and your function returns either 0 or 1.
Also there is a advantage here over the second choice as you are saving space of one variable retVal just to hold the return value and perform the check.
If there is a need to use the return value not only just for the check in if condition and somewhere else in the program then I would suggest storing the return value (choice 2)
Both methods will work fine. If you define better as code that will execute (very slightly) faster and take up (very slightly) less room when it is compiled, then alternative 1) is better. Alternative 1) will read the value of the function into a register and branch on the value in two commands and use no memory. Alternative 2) will read the value of the function into register, write the value to memory, read the value from memory into a register and branch on the value - for a total of four commands and four bytes of storage (assumes a 32 bit processor).
The first choice (note the spelling) is better, but for reasons entirely unrelated to what you might think.
The reason is that it is one line of code shorter, and therefore you have one less line of code to have to worry about, one less line of code to have to read when trying to understand how it works, one less line of code to have to maintain in the future.
Performance considerations are completely pointless under any real-life scenario, and as a matter of fact I would be willing to guess that any halfway decent compiler will produce the exact same machine code for both of these choices.
If you have questions of such a basic nature, I would strongly advice you to quit trying to "have an embedded software development point of view". Embedded is hard; try for non-embedded which is a lot easier. Once you master non-embedded, then you can try embedded.
which of the following is more efficient :
if (strcmp(str1,str2) != 0) {
...
}
OR
if (str1[0]!=str2[0] && strcmp(str1,str2) !=0 ) {
...
}
If str2 is always unique and there can be multiple str1.
There is no need of second version as strcmp is usually implemented very smartly to compare multiple characters at once.
In second version, because of short-circuit property of &&, you may save a function call. You should benchmark both version for your requirements to get the correct idea.
But my suggestion still is, there is no need of version 2 (str1[0]!=str2[0] && strcmp(str1,str2) !=0 ) proposed by you unless strcmp is proved as bottleneck (in profiling result) for your requirement and there are evidences that version 2 performs better.
strcmp(str1,str2) !=0
checks the first character and returns if they are not equal. So you need not exclusively check for
str1[0]!=str2[0].
Your str1[0]!=str2[0] does the same thing what strcmp(str1,str2) does in the first check.
strcmp starts comparing the first character of each string. If they are equal to each other, it continues with the following pairs until the characters differ or until a terminating null-character is reached.
So for second case it seems no meaning for extra condition to check first character of string.
because strcmp already done (str1[0]!=str2[0]) it.
As #Abhineet suggests "test and see for yourself".
if (strcmp(str1,str2) != 0) and if ((str1[0] != str2[0]) && strcmp(str1,str2) !=0 ) are functionally the same when each is passed a C string. This, of course, is a requirement, else, why compare performance?
C does not focus on specifying performance, so should this approach work faster with a given compiler on a given machine, it may be worse with the next version of the compiler or some compiler option change or a different string data set.
But in my experience, making code with heavily string usage on multiple platforms, this trick did improve performance in select machines and did not significantly slow others. Your results may vary.
As with any linear improvements in performance, slight code tweaks in heavily used code need deep understanding of the target machine to know if is always faster.
Typically, using your programming time to think about other approaches can reap far larger performance improvements.
1) hash codes
2) unique strings need only pointer compare
3) Other "string" structures