Optimizing C code, Horner's polynomial evaluation - c

I'm trying to learn how to optimize code (I'm also learning C), and in one of my books there's a problem for optimizing Horner's method for evaluation polynomials. I'm a little lost on how to approach the problem. I'm not great at recognizing what needs optimizing.
Any advice on how to make this function run faster would be appreciated.
Thanks
double polyh(double a[], double x, int degree) {
long int i;
double result = a[degree];
for (i = degree-1; i >= 0; i--)
result = a[i] + x*result;
return result;
}

You really need to profile your code to test whether proposed optimizations really help. For example, it may be the case that declaring i as long int rather than int slows the function on your machine, but on the other hand it may make no difference on your machine but might make a difference on others, etc. Anyway, there's no reason to declare i a long int when degree is an int, so changing it probably won't hurt. (But still profile!)
Horner's rule is supposedly optimal in terms of the number of multiplies and adds required to evaluate a polynomial, so I don't see much you can do with it. One thing that might help (profile!) is changing the test i>=0 to i!=0. Of course, then the loop doesn't run enough times, so you'll have to add a line below the loop to take care of the final case.
Alternatively you could use a do { ... } while (--i) construct. (Or is it do { ... } while (i--)? You figure it out.)
You might not even need i, but using degree instead will likely not save an observable amount of time and will make the code harder to debug, so it's not worth it.
Another thing that might help (I doubt it, but profile!) is breaking up the arithmetic expression inside the loop and playing around with order, like
for (...) {
result *= x;
result += a[i];
}
which may reduce the need for temporary variables/registers. Try it out.

Some suggestion:
You may use int instead of long int for looping index.

Almost certainly the problem is inviting you to conjecture on the values of a. If that vector is mostly zeros, then you'll go faster (by doing fewer double multiplications, which will be the clear bottleneck on most machines) by computing only the values of a[i] * x^i for a[i] != 0. In turn the x^i values can be computed by careful repeated squaring, preserving intermediate terms so that you never compute the same partial power more than once. See the Wikipedia article if you've never implemented repeated squaring.

Related

Is there a way to optimize this C program in terms of run time

its objective is to print all Fibonacci numbers up to the 93th
#include<stdio.h>
void main(){
unsigned long long first=0,second=1,sum;
printf("%llu,%llu",first,second);
unsigned char hops=1;
while (hops<93) {
sum=first+second;
printf(",%llu",sum);
first=second;
second=sum;
hops++;
}
printf("\n");
}
Of course, you can. Just print precalculated string of numbers.
#include <stdio.h>
int main()
{
puts("0,1,1,2,3,5,...");
return 0;
}
In case you need to optimize the whole operation you need to think in terms of space and time complexity. From this point onwards you can think how better these can be.
Now printing the n numbers will ofcourse require an iteration which will be of time complexity O(n). Is there any better way to do that? The thing is, if you do it multiple times you can get rid of addition by storing it somewhere. But that won't give you any better asymtotic time complexity improvement - rather you will face a O(n) space complexity - which will be an over head if printing is something you do only. (And ofcourse pre-computation have to be done).
So yes this solution is good enough to achieve what you are trying to do.
In case you need to scan some input - then I would ask you to go for getting inputs with getchar and then form the number from it (if needed). If you consider this in SPOJ or competitive programming judge server these tend to give better result. (Again time is lesser than that of scanf in large number of input cases - 10^7 etc).

Macros for 3D loops in C

I'm developing a C (C99) program that loops heavily over 3-D arrays in many places. So naturally, the following access pattern is ubiquitous in the code:
for (int i=0; i<i_size, i++) {
for (int j=0; j<j_size, j++) {
for (int k=0; k<k_size, k++) {
...
}
}
}
Naturally, this fills many lines of code with clutter and requires extensive copypasting. So I was wondering whether it would make sense to use macros to make it more compact, like this:
#define BEGIN_LOOP_3D(i,j,k,i_size,j_size,k_size) \
for (int i=0; i<(i_size), i++) { \
for (int j=0; j<(j_size), j++) { \
for (int k=0; k<(k_size), k++) {
and
#define END_LOOP_3D }}}
On one hand, from a DRY principle standpoint, this seems great: it makes the code a lot more compact, and allows you to indent the contents of the loop by just one block instead of three. On the other hand, the practice of introducing new language constructs seems hideously ugly and, even though I can't think of any obvious problems with it right now, seems alarmingly prone to creating bugs that are a nightmare to debug.
So what do you think: do the compactness and reduced repetition justify this despite the ugliness and the potential drawbacks?
Never put open or close {} inside macros. C programmers are not used to this so code gets difficult to read.
In your case this is even completely superfluous, you just don't need them. If you do such a thing do
FOR3D(I, J, K, ISIZE, JSIZE, KSIZE) \
for (size_t I=0; I<ISIIZE, I++) \
for (size_t J=0; J<JSIZE, J++) \
for (size_t K=0; K<KSIZE, K++)
no need for a terminating macro. The programmer can place the {} directly.
Also, above I have used size_t as the correct type in C for loop indices. 3D matrices easily get large, int arithmetic overflows when you don't think of it.
If these 3D arrays are “small”, you can ignore me. If your 3D arrays are large, but you don't much care about performance, you can ignore me. If you subscribe to the (common but false) doctrine that compilers are quasi-magical tools that can poop out optimal code almost irrespective of the input, you can ignore me.
You are probably aware of the general caveats regarding macros, how they can frustrate debugging, etc., but if your 3D arrays are “large” (whatever that means), and your algorithms are performance-oriented, there may be drawbacks of your strategy that you may not have considered.
First: if you are doing linear algebra, you almost certainly want to use dedicated linear algebra libraries, such as BLAS, LAPACK, etc., rather than “rolling your own”. OpenBLAS (from GotoBLAS) will totally smoke any equivalent you write, probably by at least an order of magnitude. This is doubly true if your matrices are sparse and triply true if your matrices are sparse and structured (such as tridiagonal).
Second: if your 3D arrays represent Cartesian grids for some kind of simulation (like a finite-difference method), and/or are intended to be fed to any numerical library, you absolutely do not want to represent them as C 3D arrays. You will want, instead, to use a 1D C array and use library functions where possible and perform index computations yourself (see this answer for details) where necessary.
Third: if you really do have to write your own triple-nested loops, the nesting order of the loops is a serious performance consideration. It might well be that the data-access pattern for ijk order (rather than ikj or kji) yields poor cache behavior for your algorithm, as is the case for dense matrix-matrix multiplication, for example. Your compiler might be able to do some limited loop exchange (last time I checked, icc would produce reasonably fast code for naive xGEMM, but gcc wouldn't). As you implement more and more triple-nested loops, and your proposed solution becomes more and more attractive, it becomes less and less likely that a “one loop-order fits all” strategy will give reasonable performance in all cases.
Fourth: any “one loop-order fits all” strategy that iterates over the full range of every dimension will not be tiled, and may exhibit poor performance.
Fifth (and with reference to another answer with which I disagree): I believe, in general, that the “best” data type for any object is the set with the smallest size and the least algebraic structure, but if you decide to indulge your inner pedant and use size_t or another unsigned integer type for matrix indices, you will regret it. I wrote my first naive linear algebra library in C++ in 1994. I've written maybe a half dozen in C over the last 8 years and, every time, I've started off trying to use unsigned integers and, every time, I've regretted it. I've finally decided that size_t is for sizes of things and a matrix index is not the size of anything.
Sixth (and with reference to another answer with which I disagree): a cardinal rule of HPC for deeply nested loops is to avoid function calls and branches in the innermost loop. This is particularly important where the op-count in the innermost loop is small. If you're doing a handful of operations, as is the case more often than not, you don't want to add a function call overhead in there. If you're doing hundreds or thousands of operations in there, you probably don't care about a handful of instructions for a function call/return and, therefore, they're OK.
Finally, if none of the above are considerations that jibe with what you're trying to implement, then there's nothing wrong with what you're proposing, but I would carefully consider what Jens said about braces.
The best way is to use a function. Let the compiler worry about performance and optimization, though if you are concerned you can always declare functions as inline.
Here's a simple example:
#include <stdio.h>
#include <stdint.h>
typedef void(*func_t)(int* item_ptr);
void traverse_3D (size_t x,
size_t y,
size_t z,
int array[x][y][z],
func_t function)
{
for(size_t ix=0; ix<x; ix++)
{
for(size_t iy=0; iy<y; iy++)
{
for(size_t iz=0; iz<z; iz++)
{
function(&array[ix][iy][iz]);
}
}
}
}
void fill_up (int* item_ptr) // fill array with some random numbers
{
static uint8_t counter = 0;
*item_ptr = counter;
counter++;
}
void print (int* item_ptr)
{
printf("%d ", *item_ptr);
}
int main()
{
int arr [2][3][4];
traverse_3D(2, 3, 4, arr, fill_up);
traverse_3D(2, 3, 4, arr, print);
}
EDIT
To shut up all speculations, here are some benchmarking results from Windows.
Tests were done with a matrix of size [20][30][40]. The fill_up function was called either from traverse_3D or from a 3-level nested loop directly in main(). Benchmarking was done with QueryPerformanceCounter().
Case 1: gcc -std=c99 -pedantic-errors -Wall
With function, time in us: 255.371402
Without function, time in us: 254.465830
Case 2: gcc -std=c99 -pedantic-errors -Wall -O2
With function, time in us: 115.913261
Without function, time in us: 48.599049
Case 3: gcc -std=c99 -pedantic-errors -Wall -O2, traverse_3D function inlined
With function, time in us: 37.732181
Without function, time in us: 37.430324
Why the "without function" case performs somewhat better with the function inlined, I have no idea. I can comment out the call to it and still get the same benchmarking results for the "without function" case.
The conclusion however, is that with proper optimization, performance is most likely a non-issue.

2D Dynamic Array in C: Which of those 3 snippets gets executed faster?

gprof is not working properly on my system (MinGW) so I'd like to know which one of the following snippets is more efficient, on average.
I'm aware that internally C compilers convert everything into pointers arithmetic, but nevertheless I'd like to know if any of the following snippets has any significant advantage over the others.
The array has been allocated dynamically in contiguous memory as 1d array and may be re-allocated at run time (its for a simple board game, in which the player is allowed to re-define the board's size, as often as he wants to).
Please note that i & j must get calculated and passed into the function set_cell() in every loop iteration (gridType is a simple struct with a few ints and a pointer to another cell struct).
Thanks in advance!
Allocate memory
grid = calloc( (nrows * ncols), sizeof(gridType) );
Snippet #1 (parse sequentially as 1D)
gridType *gp = grid;
register int i=0 ,j=0; // we need to pass those in set_cell()
if ( !grid )
return;
for (gp=grid; gp < grid+(nrows*ncols); gp++)
{
set_cell( gp, i, j, !G_OPENED, !G_FOUND, value, NULL );
if (j == ncols-1) { // last col of current row has been reached
j=0;
i++;
}
else // last col of current row has NOT been reached
j++;
}
Snippet #2 (parse as 2D array, using pointers only)
gridType *gp1, *gp2;
if ( !grid )
return;
for (gp1=grid; gp1 < grid+nrows; gp1+=ncols)
for (gp2=gp1; gp2 < gp1+ncols; gp2++)
set_cell( gp2, (gp1-grid), (gp2-gp1), !G_OPENED, !G_FOUND, value, NULL );
Snippet #3 (parse as 2D, using counters only)
register int i,j; // we need to pass those in set_cell()
for (i=0; i<nrows; i++)
for (j=0; j<ncols; j++)
set_cell( &grid[i * ncols + j], i, j, !G_OPENED, !G_FOUND, value, NULL);
Free memory
free( grid );
EDIT:
I fixed #2 form gp1++) to gp1+=ncols), in the 1st loop, after Paul's correction (thx!)
For anything like this, the answer is going to depend on the compiler and the machine you're running it on. You could try each of your code snippets, and calculating how long each one takes.
However, this is a prime example of premature optimization. The best thing to do is to pick the snippet which looks the clearest and most maintainable. You'll get much more benefit from doing that in the long run than from any savings you'd make from choosing the one that's fastest on your machine (which might not be fastest on someone else's anyway!)
Well, snippet 2 doesn't exactly work. You need different incrementing behavior; the outer loop should read for (gp1 = grid; gp1 < grid + (nrows * ncols); gp1 += ncols).
Of the other two, any compiler that's paying attention will almost certainly convert snippet 3 into something equivalent to snippet 1. But really, there's no way to know without profiling them.
Also, remember the words of Knuth: "Premature optimization is the ROOT OF ALL EVIL. I have seen more damage done in the name of 'optimization' than for all other causes combined, including sheer, wrongheaded stupidity." People who write compilers are smarter than you (unless you're secretly Knuth or Hofstadter), so let the compiler do its job and you can get on with yours. Trying to write "clever" optimized code will usually just confuse the compiler, preventing it from writing even better, more optimized code.
This is the way I'd write it. IMHO it's shorter, clearer and simpler than any of your ways.
int i, j;
gridType *gp = grid;
for (i = 0; i < nrows; i++)
for (j = 0; j < ncols; j++)
set_cell( gp++, i, j, !G_OPENED, !G_FOUND, value, NULL );
gprof not working isn't a real
excuse. You can still set up a
benchmark and measure execution
time.
You might not be able to measure any
difference on modern CPUs until
nrows*ncols is getting very
large or the reallocation happens
very often, so you might optimize the wrong part of your code.
This certainly is micro-optimization as the most runtime will most probably be spent in set_cell and everything else could be optimized to the same or very similar code by the compiler.
You don't know until you measure it.
Any decent compiler may produce the same code, even if it doesn't the effects of caching, pilelining, predictive branching and other clever stuff means that simply guessing the number of instructions isn't enough

What is the most elegant way to loop TWICE in C

Many times I need to do things TWICE in a for loop. Simply I can set up a for loop with an iterator and go through it twice:
for (i = 0; i < 2; i++)
{
// Do stuff
}
Now I am interested in doing this as SIMPLY as I can, perhaps without an initializer or iterator? Are there any other, really simple and elegant, ways of achieving this?
This is elegant because it looks like a triangle; and triangles are elegant.
i = 0;
here: dostuff();
i++; if ( i == 1 ) goto here;
Encapsulate it in a function and call it twice.
void do_stuff() {
// Do Stuff
}
// .....
do_stuff();
do_stuff();
Note: if you use variables or parameters of the enclosing function in the stuff logic, you can pass them as arguments to the extracted do_stuff function.
If its only twice, and you want to avoid a loop, just write the darn thing twice.
statement1;
statement1; // (again)
If the loop is too verbose for you, you can also define an alias for it:
#define TWICE for (int _index = 0; _index < 2; _index++)
This would result into that code:
TWICE {
// Do Stuff
}
// or
TWICE
func();
I would only recommend to use this macro if you have to do this very often, I think else the plain for-loop is more readable.
Unfortunately, this is not for C, but for C++ only, but does exactly what you want:
Just include the header, and you can write something like this:
10 times {
// Do stuff
}
I'll try to rewrite it for C as well.
So, after some time, here's an approach that enables you to write the following in pure C:
2 times {
do_something()
}
Example:
You'll have to include this little thing as a simple header file (I always called the file extension.h). Then, you'll be able to write programs in the style of:
#include<stdio.h>
#include"extension.h"
int main(int argc, char** argv){
3 times printf("Hello.\n");
3 times printf("Score: 0 : %d\n", _);
2 times {
printf("Counting: ");
9 times printf("%d ", _);
printf("\n");
}
5 times {
printf("Counting up to %d: ", _);
_ times printf("%d ", _);
printf("\n");
}
return 0;
}
Features:
Simple notation of simple loops (in the style depicted above)
Counter is implicitly stored in a variable called _ (a simple underscore).
Nesting of loops allowed.
Restrictions (and how to (partially) circumvent them):
Works only for a certain number of loops (which is - "of course" - reasonable, since you only would want to use such a thing for "small" loops). Current implementation supports a maximum of 18 iterations (higher values result in undefined behaviour). Can be adjusted in header file by changing the size of array _A.
Only a certain nesting depth is allowed. Current implementation supports a nesting depth of 10. Can be adjusted by redefining the macro _Y.
Explanation:
You can see the full (=de-obfuscated) source-code here. Let's say we want to allow up to 18 loops.
Retrieving upper iteration bound: The basic idea is to have an array of chars that are initially all set to 0 (this is the array counterarray). If we issue a call to e.g. 2 times {do_it;}, the macro times shall set the second element of counterarray to 1 (i.e. counterarray[2] = 1). In C, it is possible to swap index and array name in such an assignment, so we can write 2[counterarray] = 1 to acchieve the same. This is exactly what the macro times does as first step. Then, we can later scan the array counterarray until we find an element that is not 0, but 1. The corresponding index is then the upper iteration bound. It is stored in variable searcher. Since we want to support nesting, we have to store the upper bound for each nesting depth separately, this is done by searchermax[depth]=searcher+1.
Adjusting current nesting depth: As said, we want to support nesting of loops, so we have to keep track of the current nesting depth (done in the variable depth). We increment it by one if we start such a loop.
The actual counter variable: We have a "variable" called _ that implicitly gets assigned the current counter. In fact, we store one counter for each nesting depth (all stored in the array counter. Then, _ is just another macro that retrieves the proper counter for the current nesting depth from this array.
The actual for loop: We take the for loop into parts:
We initialize the counter for the current nesting depth to 0 (done by counter[depth] = 0).
The iteration step is the most complicated part: We have to check if the loop at the current nesting depth has reached its end. If so, we have do update the nesting depth accordingly. If not, we have to increment the current nesting depth's counter by 1. The variable lastloop is 1 if this is the last iteration, otherwise 0, and we adjust the current nesting depth accordingly. The main problem here is that we have to write this as a sequence of expressions, all separated by commata, which requires us to write all these conditions in a very non-straight-forward way.
The "increment step" of the for loop consists of only one assignment, that increments the appropriate counter (i.e. the element of counter of the proper nesting depth) and assigns this value to our "counter variable" _.
What about this??
void DostuffFunction(){}
for (unsigned i = 0; i < 2; ++i, DostuffFunction());
Regards,
Pablo.
What abelenky said.
And if your { // Do stuff } is multi-line, make it a function, and call that function -- twice.
Many people suggest writing out the code twice, which is fine if the code is short. There is, however, a size of code block which would be awkward to copy but is not large enough to merit its own function (especially if that function would need an excessive number of parameters). My own normal idiom to run a loop 'n' times is
i = number_of_reps;
do
{
... whatever
} while(--i);
In some measure because I'm frequently coding for an embedded system where the up-counting loop is often inefficient enough to matter, and in some measure because it's easy to see the number of repetitions. Running things twice is a bit awkward because the most efficient coding on my target system
bit rep_flag;
rep_flag = 0;
do
{
...
} while(rep_flag ^= 1); /* Note: if loop runs to completion, leaves rep_flag clear */
doesn't read terribly well. Using a numeric counter suggests the number of reps can be varied arbitrarily, which in many instances won't be the case. Still, a numeric counter is probably the best bet.
As Edsger W. Dijkstra himself put it : "two or more, use a for". No need to be any simpler.
Another attempt:
for(i=2;i--;) /* Do stuff */
This solution has many benefits:
Shortest form possible, I claim (13 chars)
Still, readable
Includes initialization
The amount of repeats ("2") is visible in the code
Can be used as a toggle (1 or 0) inside the body e.g. for alternation
Works with single instruction, instruction body or function call
Flexible (doesn't have to be used only for "doing twice")
Dijkstra compliant ;-)
From comment:
for (i=2; i--; "Do stuff");
Use function:
func();
func();
Or use macro (not recommended):
#define DO_IT_TWICE(A) A; A
DO_IT_TWICE({ x+=cos(123); func(x); })
If your compiler supports this just put the declaration inside the for statement:
for (unsigned i = 0; i < 2; ++i)
{
// Do stuff
}
This is as elegant and efficient as it can be. Modern compilers can do loop unrolling and all that stuff, trust them. If you don't trust them, check the assembler.
And it has one little advantage to all other solutions, for everybody it just reads, "do it twice".
Assuming C++0x lambda support:
template <typename T> void twice(T t)
{
t();
t();
}
twice([](){ /*insert code here*/ });
Or:
twice([]()
{
/*insert code here*/
});
Which doesn't help you since you wanted it for C.
Good rule: three or more, do a for.
I think I read that in Code Complete, but I could be wrong. So in your case you don't need a for loop.
This is the shortest possible without preprocessor/template/duplication tricks:
for(int i=2; i--; ) /*do stuff*/;
Note that the decrement happens once right at the beginning, which is why this will loop precisely twice with the indices 1 and 0 as requested.
Alternatively you can write
for(int i=2; i--; /*do stuff*/) ;
But that's purely a difference of taste.
If what you are doing is somewhat complicated wrap it in a function and call that function twice? (This depends on how many local variables your do stuff code relies on).
You could do something like
void do_stuff(int i){
// do stuff
}
do_stuff(0);
do_stuff(1);
But this may get extremely ugly if you are working on a whole bunch of local variables.
//dostuff
stuff;
//dostuff (Attention I am doing the same stuff for the :**2nd** time)
stuff;
First, use a comment
/* Do the following stuff twice */
then,
1) use the for loop
2) write the statement twice, or
3) write a function and call the function twice
do not use macros, as earlier stated, macros are evil.
(My answer's almost a triangle)
What is elegance? How do you measure it? Is someone paying you to be elegant? If so how do they determine the dollar-to-elegance conversion?
When I ask myself, "how should this be written," I consider the priorities of the person paying me. If I'm being paid to write fast code, control-c, control-v, done. If I'm being paid to write code fast, well.. same thing. If I'm being paid to write code that occupies the smallest amount of space on the screen, I short the stock of my employer.
jump instruction is pretty slow,so if you write the lines one after the other,it would work faster,than writing a loop. but modern compilers are very,very smart and the optimizations are great (if they are allowed,of course). if you have turned on your compiler's optimizations,you don't care the way,you write it - with loop or not (:
EDIT : http://en.wikipedia.org/wiki/compiler_optimizations just take a look (:
Close to your example, elegant and efficient:
for (i = 2; i; --i)
{
/* Do stuff */
}
Here's why I'd recommend that approach:
It initializes the iterator to the number of iterations, which makes intuitive sense.
It uses decrement over increment so that the loop test expression is a comparison to zero (the "i;" can be interpreted as "is i true?" which in C means "is i non-zero"), which may optimize better on certain architectures.
It uses pre-decrement as opposed to post-decrement in the counting expression for the same reason (may optimize better).
It uses a for loop instead of do/while or goto or XOR or switch or macro or any other trick approach because readability and maintainability are more elegant and important than clever hacks.
It doesn't require you to duplicate the code for "Do stuff" so that you can avoid a loop. Duplicated code is an abomination and a maintenance nightmare.
If "Do stuff" is lengthy, move it into a function and give the compiler permission to inline it if beneficial. Then call the function from within the for loop.
I like Chris Case's solution (up here), but C language doesn't have default parameters.
My solution:
bool cy = false;
do {
// Do stuff twice
} while (cy = !cy);
If you want, you could do different things in the two cycle by checking the boolean variable (maybe by ternary operator).
void loopTwice (bool first = true)
{
// Recursion is your friend
if (first) {loopTwice(false);}
// Do Stuff
...
}
I'm sure there's a more elegant way, but this is simple to read, and pretty simply to write. There might even be a way to eliminate the bool parameter, but this is what I came up with in 20 seconds.

Efficiency of boolean comparisons? In C

I'm writing a loop in C, and I am just wondering on how to optimize it a bit. It's not crucial here as I'm just practicing, but for further knowledge, I'd like to know:
In a loop, for example the following snippet:
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
If it checks both, wouldn't:
int i = 0;
while (i != 10) {
printf("%d\n", i);
i++;
}
be more efficient?
Thanks!
Both will be translated in a single assembly instruction. Most CPUs have comparison instructions for LESS THAN, for LESS THAN OR EQUAL, for EQUAL and for NOT EQUAL.
One of the interesting things about these optimization questions is that they often show why you should code for clarity/correctness before worrying about the performance impact of these operations (which oh-so often don't have any difference).
Your 2 example loops do not have the same behavior:
int i = 0;
/* this will print 11 lines (0..10) */
while (i <= 10) {
printf("%d\n", i);
i++;
}
And,
int i = 0;
/* This will print 10 lines (0..9) */
while (i != 10) {
printf("%d\n", i);
i++;
}
To answer your question though, it's nearly certain that the performance of the two constructs would be identical (assuming that you fixed the problem so the loop counts were the same). For example, if your processor could only check for equality and whether one value were less than another in two separate steps (which would be a very unusual processor), then the compiler would likely transform the (i <= 10) to an (i < 11) test - or maybe an (i != 11) test.
This a clear example of early optimization.... IMHO, that is something that programmers new to their craft are way to prone to worry about. If you must worry about it, learn to benchmark and profile your code so that your worries are based on evidence rather than supposition.
Speaking to your specific questions. First, a <= is not implemented as two operations testing for < and == separately in any C compiler I've met in my career. And that includes some monumentally stupid compilers. Notice that for integers, a <= 5 is the same condition as a < 6 and if the target architecture required that only < be used, that is what the code generator would do.
Your second concern, that while (i != 10) might be more efficient raises an interesting issue of defensive programming. First, no it isn't any more efficient in any reasonable target architecture. However, it raises a potential for a small bug to cause a larger failure. Consider this: if some line of code within the body of the loop modified i, say by making it greater than 10, what might happen? How long would it take for the loop to end, and would there be any other consequences of the error?
Finally, when wondering about this kind of thing, it often is worthwhile to find out what code the compiler you are using actually generates. Most compilers provide a mechanism to do this. For GCC, learn about the -S option which will cause it to produce the assembly code directly instead of producing an object file.
The operators <= and < are a single instruction in assembly, there should be no performance difference.
Note that tests for 0 can be a bit faster on some processors than to test for any other constant, therefore it can be reasonable to make a loop run backward:
int i = 10;
while (i != 0)
{
printf("%d\n", i);
i--;
}
Note that micro optimizations like these usually can gain you only very little more performance, better use your time to use efficient algorithms.
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
Neither, it will most likely check (i < 11). The <= 10 is just there for you to give better meaning to your code since 11 is a magic number which actually means (10+1).
Depends on the architecture and compiler. On most architectures, there is a single instruction for <= or the opposite, which can be negated, so if it is translated into a loop, the comparison will most likely be only one instruction. (On x86 or x86_64 it is one instruction)
The compiler might unroll the loop into a sequence of ten times i++, when only constant expressions are involved it will even optimize the ++ away and leave only constants.
And Ira is right, the comparison does vanish if there is a printf involved, which execution time might be millions of clock cycles.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit.
If you compile with optimizations turned on, the biggest optimization will be from unrolling that loop.
It's going to be hard to profile that code with -O2, because for trivial functions the compiler will unroll the loop and you won't be able to benchmark actual differences in compares. You should be careful when profiling test cases that use constants that might make the code trivial when optimized by the compiler.
disassemble. Depending on the processor, and optimization and a number of things this simple example code actually unrolls or does things that do not reflect your real question. Compiling with gcc -O1 though both example loops you provided resulted in the same assembler (for arm).
Less than in your C code often turns into a branch if greater than or equal to the far side of the loop. If your processor doesnt have a greater than or equal it may have a branch if greater than and a branch if equal, two instructions.
typically though there will be a register holding i. there will be an instruction to increment i. Then an instruction to compare i with 10, then equal to, greater than or equal, and less than are generally done in a single instruction so you should not normally see a difference.
// Case I
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
printf("%d\n", i);
i++;
}
// Case II
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Case I code take more space but fast and Case II code is take less space but slow compare to Case I code.
Because in programming space complexity and time complexity always proportional to each other. It means you must compromise either space or time.
So in that way you can optimize your time complexity or space complexity but not both.
And your both code are same.

Resources