I was playing with the C language on my own and I was trying to write the fastest possible algorithm to find amicable numbers.
This is what I wrote (I've just started, so please do not suggest me methods to improve the algorithm, since I want to write it on my own):
#include <stdio.h>
#include <time.h>
#define MAX (200000)
int dividersSum(int);
void amicable();
int main() {
clock_t start = clock();
amicable();
double executionTime = ((double)clock() - start) / CLOCKS_PER_SEC;
printf("\nEXECUTION TIME: %lf", executionTime);
return 0;
}
int dividersSum(int n) {
int i, sum;
for (sum = 1, i = 2; i <= n / 2; i++) {
if (!(n % i)) {
sum += n / i;
}
}
return sum;
}
void amicable() {
int a, divSum, tot = 0;
for (a = 1; a < MAX; a++) {
divSum = dividersSum(a);
if (divSum > a && dividersSum(divSum) == a) {
printf("\n\t%d\t\t%d", a, dividersSum(a));
tot++;
}
}
printf("\n\nTOT: %d", tot);
}
Now, this works fine. Or, at least, probably not that fine since it took exactly 40 seconds to complete, but it works.
But if I change this line:
int i, sum;
Into this:
int i, sum, a = 4, b = 4, c = 4, d = 4, e = 4, f = 4;
It "drastically" improves. It takes 36 seconds to complete.
I get these execution times from the console timer. I know it isn't accurate at all (indeed as soon as I have the chance to work on this algorithm again I'll try to use time.h library), but I tried the 2 versions of the code over 50 times, and I always get 40 or more seconds for the "normal" version, and 36 or less for the other one.
I've also tried to change the machine where I run the program, but it always takes about 10% less to execute the modified version.
Of course this make no sense to me (I'm pretty new to programming, and I've Googled it but nothing even because I don't really know what to look for...), the only thing I can think about is an optimization of the compiler (I use the hated Dev c++), but which optimization? And if this is the case, why doesn't it use the same optimization also in the "normal" code, since it makes it faster?
Oh, if you're wondering why I tried to declare random variables, the reason is that I wanted to test if there was a measurable worsening in using more variables. I now know it is a very stupid way to test this, but as I said at the beginning of the post, I was "playing"...
Well, I asked my teacher at university. He ran the two versions on his machine, and he was fairly surprised at the beginning (43 seconds the "normal" one and 36 the faster).
Then he told me that he didn't know exactly why this happens, but he speculated that it happens due to the way the compiler organizes the code. Probably those extra variables force the compiler to store the code in different pages of the memory, and this is why this happens.
Of course he wasn't sure about his answer, but it seems fair to me.
It's interesting how things like this can happen sometimes.
Moreover, as Brendan said in the comments section:
If the compiler isn't ignoring (or warning about) unused variables (even though it's a relatively trivial optimisation), then the compiler is poo (either bad in general or crippled by command line options), and the answer to your question should be "the compiler is poo" (e.g. fails to optimise the second version into exactly the same output code as the first version).
Of course if someone thinks to have a better explaination I would be happy to listen to him!
Related
Assume the following code:
static int array[10];
int main ()
{
for (int i = 0; i < (sizeof(array) / sizeof(array[0])); i++)
{
// ...
}
}
The result of sizeof(array) / sizeof(array[0]) should in theory be known at compile time and set to some value depending on the size of the int. Even though, will the compiler do the manual division in run time each time the for loop iterates?
To avoid that, does the code need to be adjusted as:
static int array[10];
int main ()
{
static const int size = sizeof(array) / sizeof(array[0]);
for (int i = 0; i < size; i++)
{
// ...
}
}
You should write the code in whatever way is most readable and maintainable for you. (I'm not making any claims about which one that is: it's up to you.) The two versions of the code you wrote are so similar that a good optimizing compiler should probably produce equally good code for each version.
You can click on this link to see what assembly your two different proposed codes generate in various compilers:
https://godbolt.org/z/v914qYY8E
With GCC 11.2 (targetting x86_64) and with minimal optimizations turned on (-O1), both versions of your main function have the exact same assembly code. With optimizations turned off (-O0), the assembly is slightly different but the size calculation is still done at a compile time for both.
Even if you doubt what I am saying, it is still better to use the more readable version as a starting point. Only change it to the less readable version if you find an actual example of a programming environment where doing that would provide a meaningful speed increase for you application. Avoid wasting time with premature optimization.
Even though, will the compiler do the manual division in run time each time the for loop iterates?
No. It's an integer constant expression which will be calculated at compile-time. Which is why you can even do this:
int some_other_array [sizeof(array) / sizeof(array[0])];
To avoid that, does the code need to be adjusted as
No.
See for yourself: https://godbolt.org/z/rqv15vW6a. Both versions produced 100% identical machine code, each one containing a mov ebx, 10 instruction with the pre-calculated value.
I am just beginner in Haskell. And I writing a code to display the N numbers in the Fibonacci sequence. Here is my code in Haskell,
fib_seq 1 = 1:[]
fib_seq 2 = 1:1:[]
fib_seq n = sum(take 2 (fib_seq (n-1))):fib_seq (n-1)
When I run this code for higher numbers like fib_seq 40 in GHCI, it takes a long time to evaluate it and my computer hangs and I have to interrupt. However, when I write the same exact logic in C, (I just print instead of saving it in the list),
#include<stdio.h>
int fib_seq (int n){
if(n==1) return 1;
else if(n==2) return 1;
else return fib_seq(n-1)+fib_seq(n-2); }
void print_fib(int n){
if(n==0) return;
else printf("%i ", fib_seq(n));
print_fib(n-1); }
int main(int argn, char* argc){
print_fib(40);
return 0; }
The code is very fast. Takes about 1 second to run when compiled with GCC. Is Haskell supposed to be this slow than C? I have looked up other answers on the internet and they say something about memoization. I am beginning Haskell and I don't know what that means. What I am saying is that the C code and Haskell code I wrote both do the same exact steps and Haskell is so much slower than C, it hangs my GHCI. A 1-2 seconds difference is something I will never worry about, and if C also had taken the same exact time as Haskell, I would also not worry about. But Haskell crashing and C doing it in 1 seconds is unacceptable.
The following program, compiled with ghc -O2 test.hs, is +/-2% the speed of the C code you posted, compiled with gcc -O2 test.c.
fib_seq :: Int -> Int
fib_seq 1 = 1
fib_seq 2 = 1
fib_seq n = fib_seq (n-1) + fib_seq (n-2)
main = mapM_ (print . fib_seq) [40,39..1]
Some comments:
Unlike you, I implemented the exact same logic. I doubt this is the real difference, though; see the remaining comments for much more likely causes.
I specified the same types as C uses for the arithmetic. You didn't, which is likely to run into two problems: using Integer instead of Int for largenum arithmetic, and having a class-polymorphic type instead of a monomorphic one adding overhead on every function call.
I compiled. ghci is built to be interactive as quickly as possible, not to produce quick code.
I don't have the right version of llvm installed at the moment, but it will often crunch through heavily-numeric code like this much better than ghc's own codegen. I wouldn't be too surprised if it ended up being faster than gcc.
Of course using one of the many well-known better algorithms for fibonacci is going to trump all this nonsense.
Guess what happens if "fib_seq (n-1)" is evaluated twice on each recursion.
And then try this:
fib_seq 1 = 1:[]
fib_seq 2 = 1:1:[]
fib_seq n = sum(take 2 f):f
where f = fib_seq (n-1)
I'm trying to learn how to optimize code (I'm also learning C), and in one of my books there's a problem for optimizing Horner's method for evaluation polynomials. I'm a little lost on how to approach the problem. I'm not great at recognizing what needs optimizing.
Any advice on how to make this function run faster would be appreciated.
Thanks
double polyh(double a[], double x, int degree) {
long int i;
double result = a[degree];
for (i = degree-1; i >= 0; i--)
result = a[i] + x*result;
return result;
}
You really need to profile your code to test whether proposed optimizations really help. For example, it may be the case that declaring i as long int rather than int slows the function on your machine, but on the other hand it may make no difference on your machine but might make a difference on others, etc. Anyway, there's no reason to declare i a long int when degree is an int, so changing it probably won't hurt. (But still profile!)
Horner's rule is supposedly optimal in terms of the number of multiplies and adds required to evaluate a polynomial, so I don't see much you can do with it. One thing that might help (profile!) is changing the test i>=0 to i!=0. Of course, then the loop doesn't run enough times, so you'll have to add a line below the loop to take care of the final case.
Alternatively you could use a do { ... } while (--i) construct. (Or is it do { ... } while (i--)? You figure it out.)
You might not even need i, but using degree instead will likely not save an observable amount of time and will make the code harder to debug, so it's not worth it.
Another thing that might help (I doubt it, but profile!) is breaking up the arithmetic expression inside the loop and playing around with order, like
for (...) {
result *= x;
result += a[i];
}
which may reduce the need for temporary variables/registers. Try it out.
Some suggestion:
You may use int instead of long int for looping index.
Almost certainly the problem is inviting you to conjecture on the values of a. If that vector is mostly zeros, then you'll go faster (by doing fewer double multiplications, which will be the clear bottleneck on most machines) by computing only the values of a[i] * x^i for a[i] != 0. In turn the x^i values can be computed by careful repeated squaring, preserving intermediate terms so that you never compute the same partial power more than once. See the Wikipedia article if you've never implemented repeated squaring.
I'm a new C programmer and I'm writing some data structures for homework.
I have two questions here.
We see a lot of examples of C's function-pointers, usually used to save code duplication. I messed around with this function, which I initially wrote:
(The constants we're pre #defined. Indentation is off, too).
static PlayerResult playerCheckArguments(const char* name, int age,
int attack, int defense) {
PlayerResult result = PLAYER_SUCCESS;
if (!name) {
result = PLAYER_NULL_ARGUMENT;
} else if (strlen(name) > PLAYER_MAX_NAME_LENGTH) {
result = PLAYER_NAME_TOO_LONG;
} else if (invalidAge(age)) {
result = PLAYER_INVALID_AGE;
} else if (invalidAttack(attack)) {
result = PLAYER_INVALID_ATTACK;
} else if (invalidDefense(defense)) {
result = PLAYER_INVALID_DEFENSE;
}
return result;
}
until I got this ghoul:
static PlayerResult playerCheckArguments(const char* name, int age, int attack,
int defense) {
void* arguments[PLAYER_NUM_OF_PAREMETERS] = { name, &age, &attack, &defense };
PlayerResult (*funcArray[PLAYER_NUM_OF_PAREMETERS])(
int) = {&invalidName, &invalidAge, &invalidAttack, &invalidDefense };
PlayerResult result = PLAYER_SUCCESS;
for (int i = 0;
i < PLAYER_NUM_OF_PAREMETERS && result == PLAYER_SUCCESS; i++) {
PlayerResult (*func)(int) = funcArray[i];
void* key = arguments[i];
result = func(key);
}
return result;
My first question being - is there any reason why I should use/write the second function over the other, and generally try to use such "sophistications" which obviously lessen the code's clarity and/or simplicity?
now, for my second question: As you may have noticed, I am using a lot of local variables for the purpose of easier debugging. this way, I can see all relevant evaluations and efficiently monitor the program as it runs.
Is there any other way to display expressions made in a function other than using local variables?
Thanks very much!
return 0 ;-)
Clarity is far more important than cleverness. The harder it is to figure out the harder it is to get right, and to debug when you don't.
There is nothing wrong with using local variables for clarity or debugging. There is an ole saw that goes "Avoid the sin of premature optimization". Make your code as simple and as clear as you can. If you then find that isn't enough work to add as little complexity as needed to get the job done.
Since your question is tagged coding style, I'll just say, the first is definitely preferred. The reason is simple. Show the two functions to 200 programmers, 100 see the first, 100 see the second, and then record the average time it takes for the programmers to be able to describe what the function does. you'll absolutely, averaged over hundreds of programmers, find that the first wins every time.
So you would only do the second if perhaps you had 20+ different parameters to check, and even then there are cleaner ways to do it. I don't believe you'd see any speed increase for the second one either.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit. It's not crucial here as I'm just practicing, but for further knowledge, I'd like to know:
In a loop, for example the following snippet:
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
If it checks both, wouldn't:
int i = 0;
while (i != 10) {
printf("%d\n", i);
i++;
}
be more efficient?
Thanks!
Both will be translated in a single assembly instruction. Most CPUs have comparison instructions for LESS THAN, for LESS THAN OR EQUAL, for EQUAL and for NOT EQUAL.
One of the interesting things about these optimization questions is that they often show why you should code for clarity/correctness before worrying about the performance impact of these operations (which oh-so often don't have any difference).
Your 2 example loops do not have the same behavior:
int i = 0;
/* this will print 11 lines (0..10) */
while (i <= 10) {
printf("%d\n", i);
i++;
}
And,
int i = 0;
/* This will print 10 lines (0..9) */
while (i != 10) {
printf("%d\n", i);
i++;
}
To answer your question though, it's nearly certain that the performance of the two constructs would be identical (assuming that you fixed the problem so the loop counts were the same). For example, if your processor could only check for equality and whether one value were less than another in two separate steps (which would be a very unusual processor), then the compiler would likely transform the (i <= 10) to an (i < 11) test - or maybe an (i != 11) test.
This a clear example of early optimization.... IMHO, that is something that programmers new to their craft are way to prone to worry about. If you must worry about it, learn to benchmark and profile your code so that your worries are based on evidence rather than supposition.
Speaking to your specific questions. First, a <= is not implemented as two operations testing for < and == separately in any C compiler I've met in my career. And that includes some monumentally stupid compilers. Notice that for integers, a <= 5 is the same condition as a < 6 and if the target architecture required that only < be used, that is what the code generator would do.
Your second concern, that while (i != 10) might be more efficient raises an interesting issue of defensive programming. First, no it isn't any more efficient in any reasonable target architecture. However, it raises a potential for a small bug to cause a larger failure. Consider this: if some line of code within the body of the loop modified i, say by making it greater than 10, what might happen? How long would it take for the loop to end, and would there be any other consequences of the error?
Finally, when wondering about this kind of thing, it often is worthwhile to find out what code the compiler you are using actually generates. Most compilers provide a mechanism to do this. For GCC, learn about the -S option which will cause it to produce the assembly code directly instead of producing an object file.
The operators <= and < are a single instruction in assembly, there should be no performance difference.
Note that tests for 0 can be a bit faster on some processors than to test for any other constant, therefore it can be reasonable to make a loop run backward:
int i = 10;
while (i != 0)
{
printf("%d\n", i);
i--;
}
Note that micro optimizations like these usually can gain you only very little more performance, better use your time to use efficient algorithms.
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
Neither, it will most likely check (i < 11). The <= 10 is just there for you to give better meaning to your code since 11 is a magic number which actually means (10+1).
Depends on the architecture and compiler. On most architectures, there is a single instruction for <= or the opposite, which can be negated, so if it is translated into a loop, the comparison will most likely be only one instruction. (On x86 or x86_64 it is one instruction)
The compiler might unroll the loop into a sequence of ten times i++, when only constant expressions are involved it will even optimize the ++ away and leave only constants.
And Ira is right, the comparison does vanish if there is a printf involved, which execution time might be millions of clock cycles.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit.
If you compile with optimizations turned on, the biggest optimization will be from unrolling that loop.
It's going to be hard to profile that code with -O2, because for trivial functions the compiler will unroll the loop and you won't be able to benchmark actual differences in compares. You should be careful when profiling test cases that use constants that might make the code trivial when optimized by the compiler.
disassemble. Depending on the processor, and optimization and a number of things this simple example code actually unrolls or does things that do not reflect your real question. Compiling with gcc -O1 though both example loops you provided resulted in the same assembler (for arm).
Less than in your C code often turns into a branch if greater than or equal to the far side of the loop. If your processor doesnt have a greater than or equal it may have a branch if greater than and a branch if equal, two instructions.
typically though there will be a register holding i. there will be an instruction to increment i. Then an instruction to compare i with 10, then equal to, greater than or equal, and less than are generally done in a single instruction so you should not normally see a difference.
// Case I
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
printf("%d\n", i);
i++;
}
// Case II
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Case I code take more space but fast and Case II code is take less space but slow compare to Case I code.
Because in programming space complexity and time complexity always proportional to each other. It means you must compromise either space or time.
So in that way you can optimize your time complexity or space complexity but not both.
And your both code are same.