Let's say I have a large array and I'm doing something like the following:
for (int i = 0; i < sizeof(arr)/sizeof(arr[0]); ++i)
printf("arr[%d]=%d\n", i, arr[i]);
Of course, the sizeof stuff shouldn't be calculated every time so it should be like this instead:
size_t len = sizeof(arr)/sizeof(arr[0]);
for (int i = 0; i < len; ++i)
printf("arr[%d]=%d\n", i, arr[i]);
My questions is whether it can be assumed that any compiler will automatically do the above optimization and so it doesn't matter which approach I use? Or should I assume that that's not the case and the second approach is the only correct one.
There is no reason to hoist the division out of the loop condition. It isn't even a question of how many times the division is evaluated (as is the usual concern in these cases, and may require the compiler to use "escape analysis" to determine whether the length value can change between iterations). The simple fact is that sizeof(arr)/sizeof(arr[0]) is a constant expression which will be evaluated at compile time, and will be no different to if you hard-coded a number there.
Related
This is a general issue I've run into, and I've yet to find a solution to it that doesn't feel very "hack-y". Suppose I have some array of elements xs = {a_1, a_2, ..., a_n}, where I know some x is in the array. I wish to loop through the array, and do something with each element, up to and including the element x. Here's a version that does almost that, except it leaves out the very last element. Note that in this example the array happens to be a sorted list of integers, but in the general case this might not necessarily be true.
int xs[] = {1,2,3,4,5};
for (int i = 0; xs[i] != 4; i++) {
foo(xs[i]);
}
The only solutions I've seen so far are:
Just add a final foo(xs[i]); statement after the for-loop. This is first of all ugly and repetitious, especially in the case where foo is not just a function call but a list of statements. Second, it requires i to be defined outside the scope of the for-loop.
Manually break the loop, with an if-statement inside an infinite loop. This again seems ugly to me, since we're not really using the for and while constructs to their full extent. The problem is almost archetypal of what you'd use a for-loop for, the only difference is that we just want it to go through the loop one more time.
Does anyone know of a good solution to this problem?
In C, the for loop is a "check before body" operation, you want the "check after body" variant, a do while loop, something like:
int xs[] = {1,2,3,4,5};
{
int i = 0;
do {
foo(xs[i]);
} while (xs[i++] != 4);
}
You'll notice I've enclosed the entire chunk in its own scope (the outermost {} braces). This is just to limit the existence of i to make it conform more with the for loop behaviour.
In terms of a complete program showing this, the following code:
#include <stdio.h>
void foo(int x) {
printf("%d\n", x);
}
int main(void) {
int xs[] = {1,2,3,4,5};
{
int i = 0;
do {
foo(xs[i]);
} while (xs[i++] != 4);
}
return 0;
}
outputs:
1
2
3
4
As an aside, like you, I'm also not that keen of the two other solutions you've seen.
For the first solution, that won't actually work in this case since the lifetime of i is limited to the for loop itself (the int in the for statement initialisation section makes this so).
That means i will not have the value you expect after the loop. Either there will be no i (a compile-time error) or there will be an i which was hidden within the for loop and therefore unlikely to have the value you expect, leading to insidious bugs.
For the second, I will sometimes break loops within the body but generally only at the start of the body so that the control logic is still visible in a single area. I tend to do that if the for condition would be otherwise very lengthy but there are other ways to do this.
Try processing the loop as long as the previous element (if available) is not 4:
int xs[] = {1,2,3,4,5};
for (int i = 0; i == 0 || xs[i - 1] != 4; i++) {
foo(xs[i]);
}
This may not be a direct answer to the original question, but I would strongly suggest against making a habit of parsing arrays like that (it's like a ticking bomb waiting to explode at a random point in time).
I know you said you already know x is a member of xs, but when it is not (and this can accidentally happen for a variety of reasons) then your loop will crash your program if you are lucky, or it will corrupt irrelevant data if you are not lucky.
In my opinion, it is neither ugly nor "hacky" to be defensive with an extra check.
If the hurdle is the seemingly unknown length of xs, it is not. Static arrays have a known length, either by declaration or by initialization (like your example). In the latter case, the length can be calc'ed on demand within the scope of the declared array, by sizeof(arr) / sizeof(*arr) - you can even make it a reusable macro.
#define ARRLEN(a) (sizeof(a)/sizeof(*(a)))
...
int xs[] = {1,2,3,4,5};
/* way later... */
size_t xslen = ARRLEN(xs);
for (size_t i=0; i < xslen; i++) {
if (xs[i] == 4) {
foo( xs[i] );
break;
};
}
This will not overrun xs, even when 4 is not present in the array.
EDIT:
"within the scope of the declared array" means that the macro (or its direct code) will not work on an array passed as a function parameter.
// This does NOT work, because sizeof(arr) returns the size of an int-pointer
size_t foo( int arr[] ) {
return( sizeof(arr)/sizeof(*arr) );
}
If you need the length of an array inside a function, you can pass it too as a parameter along with the array (which actually is just a pointer to the 1st element).
Or if performance is not an issue, you may use the sentinel approach, explained below.
[end of EDIT]
An alternative could be to manually mark the end of your array with a sentinel value (a value you intentionally consider invalid). For example, for integers it could be INT_MAX:
#include <limits.h>
...
int xs[] = {1,2,3,4,5, INT_MAX};
for (size_t i=0; xs[i] != INT_MAX; i++) {
if (xs[i] == 4) {
foo( xs[i] );
break;
};
}
Sentinels are quite common for parsing unknown-length dynamically-allocated arrays of pointers, with the sentinel being NULL.
Anyway, my main point is that preventing accidental buffer overruns probably has a higher priority compared to code prettiness :)
(If this is a duplicate please point me to an answer)
I have two scenarios where a loop is checking a complex expression over and over again (a complex expression would consist of math operations and retrieving data):
for (int i = 0; i < expression; i++) {
// stuff
}
for (int i = 0; i < someNumber; i++) {
if (i == expression) break;
}
I'm wondering if it's more efficient to pre-calculate the expression and check against a known value like so
int known = expression;
for (int i = 0; i < known; i++) {
// stuff
}
for (int i = 0; i < someNumber; i++) {
if (i == known) break;
}
or if it's done by the compiler automatically.
For reference, I'm running the loop ~700 000 000 times and the expression is something like structure->arr[j] % n or sqrt(a * n + b)
Is it even worth it?
If the compiler is able to detect that calculating expression will give the same result every time, it will only do the calculation once.
The tricky part is: "If the compiler is able to ...."
Compilers are very smart and will probably be successful in most cases. But why take the chance?
Just write that extra line to do the calculation before the loop as you did in your second example.
By doing that you send a clear message to the compiler about expression being constant within the loops. Further it may also help your co-workers to easier understand the code.
That said... you yourself must be sure that expression is in fact the same every time. Let's look at your example:
the expression is something like structure->arr[i] % n or sqrt(a * n + b)
Now the first one, i.e. structure->arr[i] % n depends on the loop variable i so it will be a big mistake to move the code outside the loop.
The second (i.e. sqrt(a * n + b)) looks better provided that a n b doesn't change inside the loop.
Suppose I have, in C99, for(int j=0; j < t->k; j++), t->k does not change throughout the loop iteration. Does the compiler optimize this line, or there will be one dereferencing operation per loop iteration?
In other words, would
tmpk = t->k;
for(int j = 0; j < tmpk; j++)
be better for a large number of iterations?
In the general case this depends on whether t is declared restrict; in the absence of an explicit aliasing restriction the compiler cannot assume that no other pointer provides a path to modify k.
Of course, if the compiler can prove that t->k is invariant by inspection of the loop body, it may choose to move the dereference out of the loop body, or do so incorrectly if the optimizer is buggy.
Explicitly caching the value of t->k in a local variable would rather reliably force the issue. :-)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Is this type of expression in C valid (on all compilers)?
If it is, is this good C?
char cloneString (char clone[], char string[], int Length)
{
if(!Length)
Length=64
;
int i = 0
;
while((clone[i++] = string[i]) != '\0', --Length)
;
clone[i] = '\0';
printf("cloneString = %s\n", clone);
};
Would this be better, worse, indifferent?
char *cloneString (char clone[], char string[], int Length)
{
if(!Length)
Length=STRING_LENGTH
;
char *r = clone
;
while
( //(clone[i++] = string[i]) != '\0'
*clone++ = *string++
, --Length
);
*clone = '\0';
return clone = r
;
printf("cloneString = %s\n", clone);
};
Stackoverflow wants me to add more text to this question!
Okay! I'm concerned about
a.) expressions such as c==(a=b)
b.) performance between indexing vs pointer
Any comments?
Thanks so much.
Yes, it's syntactically valid on all compilers (though semantically valid on none), and no, it isn't considered good C. Most developers will agree that the comma operator is a bad thing, and most developers will generally agree that a single line of code should do only one specific thing. The while loop does a whole four and has undefined behavior:
it increments i;
it assigns string[i] to clone[i++]; (undefined behavior: you should use i only once in a statement that increments/decrements it)
it checks that string[i] isn't 0 (but discards the result of the comparison);
it decrements Length, and terminates the loop if Length == 0 after being decremented.
Not to mention that assuming that Length is 64 if it wasn't provided is a terrible idea and leaves plenty of room for more undefined behavior that can easily be exploited to crash or hack the program.
I see that you wrote it yourself and that you're concerned about performance, and this is apparently the reason you're sticking everything together. Don't. Code made short by squeezing statements together isn't faster than code longer because the statements haven't been squeezed together. It still does the same number of things. In your case, you're introducing bugs by squeezing things together.
The code has Undefined Behavior:
The expression
(clone[i++] = string[i])
both modifies and accesses the object i from two different subexpressions in an unsequenced way, which is not allowed. A compiler might use the old value of i in string[i], or might use the new value of i, or might do something entirely different and unexpected.
Simple answer no.
Why return char and the function has no return statement?
Why 64?
I assume that the two arrays are of length Length - Add documentation to say this.
Why the ; on a new line and not after the statement?
...
Ok so I decided to evolve my comments into an actual answer. Although this doesn’t address the specific piece of code in your question, it answers the underlying issue and I think you will find it illuminating as you can use this — let’s call it guide — on your general programming.
What I advocate, especially if you are just learning programming is to focus on readability instead of small gimmicks that you think or was told that improve speed / performance.
Let’s take a simple example. The idiomatic way to iterate through a vector in C (not in C++) is using indexing:
int i;
for (i = 0; i < size; ++i) {
v[i] = /* code */;
}
I was told when I started programming that v[i] is actually computed as *(v + i) so in generated assembler this is broken down (please note that this discussion is simplified):
multiply i with sizeof(int)
add that result to the address of v
access the element at this computed address
So basically you have 3 operations.
Let’s compare this with accessing via pointers:
int *p;
for (p = v; p != v + size; ++p) {
*p = /*..*/;
}
This has the advantage that *p actually expands to just one instruction:
access the element at the address p.
2 extra instructions don’t seam much but if your program spends most of it’s time in this loop (either extremely large size or multiple calls to (the functions containing this) loop) you realise that the second version makes your program almost 3 times faster. That is a lot. So if you are like me when I started, you will choose the second variant. Don’t!
So the first version has readability (you explicitly describe that you access the i-th element of vector v), the second one uses a gimmick in detriment of readability (you say that you access a memory location). Now this might not be the best example for unreadable code, but the principle is valid.
So why do I tell you to use the first version: until you have a firm grasp on concepts like cache, branching, induction variables (and a lot more) and how they apply in real world compilers and programs performance, you should stay clear of these gimmicks and rely on the compiler to do the optimizations. They are very smart and will generate the same code for both variants (with optimization enabled of course). So the second variant actually differs just by readability and is identical performance-wise with the first.
Another example:
const char * str = "Some string"
int i;
// variant 1:
for (i = 0; i < strlen(str); ++i) {
// code
}
// variant 2:
int l = strlen(str);
for (i = 0; i < l; ++i) {
// code
}
The natural way would be to write the first variant. You might think that the second improves performance because you call the function strlen on each iteration of the loop. And you know that getting the length of a string means iterating through all the string until you reach the end. So basically a call to strlen means adding an inner loop. Ouch that has to slow the program down. Not necessarily: the compiler can optimize the call out because it always produces the same result. Actually you can do harm as you introduce a new variable which will have to be assigned a different register from a very limited registry pool (a little extreme example, but nevertheless a point is to be made here).
Don’t spend your energy on things like this until much later.
Let me show you something else that will illustrate further more that any assumptions that you make about performance will be most likely be false and misleading (I am not trying to tell you that you are a bad programmer — far from it — just that as you learn, you should invest your energy in something else than performance):
Let’s multiply two matrices:
for (k = 0; k < n; ++k) {
for (i = 0; i < n; ++i) {
for (j = 0; j < n; ++j) {
r[i][j] += a[i][k] * b[k][j];
}
}
}
versus
for (k = 0; k < n; ++k) {
for (j = 0; j < n; ++j) {
for (i = 0; i < n; ++i) {
r[i][j] += a[i][k] * b[k][j];
}
}
}
The only difference between the two is the order the operations get executed. They are the exact same operations (number, kind and operands), just in a different order. The result is equivalent (addition is commutative) so on paper they should take the EXACT amount of time to execute. In practice, even with optimizations enable (some very smart compilers can however reorder the loops) the second example can be up to 2-3 times slower than the first. And even the first variant is still a long long way from being optimal (in regards to speed).
So basic point: worry about UB as the other answers show you, don’t worry about performance at this stage.
The second block of code is better.
The line
printf("cloneString = %s\n", clone);
there will never get executed since there a return statement before that.
To make your code a bit more readable, change
while
(
*clone++ = *string++
, --Length
);
to
while ( Length > 0 )
{
*clone++ = *string++;
--Length;
}
This is probably a better approach to your problem:
#include <stdio.h>
#include <string.h>
void cloneString(char *clone, char *string)
{
for (int i = 0; i != strlen(string); i++)
clone[i] = string[i];
printf("Clone string: %s\n", clone);
}
That been said, there's already a standard function to to that:
strncpy(const char *dest, const char *source, int n)
dest is the destination string, and source is the string that must be copied. This function will copy a maximum of n characters.
So, your code will be:
#include <stdio.h>
#include <string.h>
void cloneString(char *clone, char *string)
{
strncpy(clone, string, strlen(string));
printf("Clone string: %s\n", clone);
}
gprof is not working properly on my system (MinGW) so I'd like to know which one of the following snippets is more efficient, on average.
I'm aware that internally C compilers convert everything into pointers arithmetic, but nevertheless I'd like to know if any of the following snippets has any significant advantage over the others.
The array has been allocated dynamically in contiguous memory as 1d array and may be re-allocated at run time (its for a simple board game, in which the player is allowed to re-define the board's size, as often as he wants to).
Please note that i & j must get calculated and passed into the function set_cell() in every loop iteration (gridType is a simple struct with a few ints and a pointer to another cell struct).
Thanks in advance!
Allocate memory
grid = calloc( (nrows * ncols), sizeof(gridType) );
Snippet #1 (parse sequentially as 1D)
gridType *gp = grid;
register int i=0 ,j=0; // we need to pass those in set_cell()
if ( !grid )
return;
for (gp=grid; gp < grid+(nrows*ncols); gp++)
{
set_cell( gp, i, j, !G_OPENED, !G_FOUND, value, NULL );
if (j == ncols-1) { // last col of current row has been reached
j=0;
i++;
}
else // last col of current row has NOT been reached
j++;
}
Snippet #2 (parse as 2D array, using pointers only)
gridType *gp1, *gp2;
if ( !grid )
return;
for (gp1=grid; gp1 < grid+nrows; gp1+=ncols)
for (gp2=gp1; gp2 < gp1+ncols; gp2++)
set_cell( gp2, (gp1-grid), (gp2-gp1), !G_OPENED, !G_FOUND, value, NULL );
Snippet #3 (parse as 2D, using counters only)
register int i,j; // we need to pass those in set_cell()
for (i=0; i<nrows; i++)
for (j=0; j<ncols; j++)
set_cell( &grid[i * ncols + j], i, j, !G_OPENED, !G_FOUND, value, NULL);
Free memory
free( grid );
EDIT:
I fixed #2 form gp1++) to gp1+=ncols), in the 1st loop, after Paul's correction (thx!)
For anything like this, the answer is going to depend on the compiler and the machine you're running it on. You could try each of your code snippets, and calculating how long each one takes.
However, this is a prime example of premature optimization. The best thing to do is to pick the snippet which looks the clearest and most maintainable. You'll get much more benefit from doing that in the long run than from any savings you'd make from choosing the one that's fastest on your machine (which might not be fastest on someone else's anyway!)
Well, snippet 2 doesn't exactly work. You need different incrementing behavior; the outer loop should read for (gp1 = grid; gp1 < grid + (nrows * ncols); gp1 += ncols).
Of the other two, any compiler that's paying attention will almost certainly convert snippet 3 into something equivalent to snippet 1. But really, there's no way to know without profiling them.
Also, remember the words of Knuth: "Premature optimization is the ROOT OF ALL EVIL. I have seen more damage done in the name of 'optimization' than for all other causes combined, including sheer, wrongheaded stupidity." People who write compilers are smarter than you (unless you're secretly Knuth or Hofstadter), so let the compiler do its job and you can get on with yours. Trying to write "clever" optimized code will usually just confuse the compiler, preventing it from writing even better, more optimized code.
This is the way I'd write it. IMHO it's shorter, clearer and simpler than any of your ways.
int i, j;
gridType *gp = grid;
for (i = 0; i < nrows; i++)
for (j = 0; j < ncols; j++)
set_cell( gp++, i, j, !G_OPENED, !G_FOUND, value, NULL );
gprof not working isn't a real
excuse. You can still set up a
benchmark and measure execution
time.
You might not be able to measure any
difference on modern CPUs until
nrows*ncols is getting very
large or the reallocation happens
very often, so you might optimize the wrong part of your code.
This certainly is micro-optimization as the most runtime will most probably be spent in set_cell and everything else could be optimized to the same or very similar code by the compiler.
You don't know until you measure it.
Any decent compiler may produce the same code, even if it doesn't the effects of caching, pilelining, predictive branching and other clever stuff means that simply guessing the number of instructions isn't enough