When is the gcc optimization most effective? - c

When I pass the -O3 option to compile C code with gcc, it usually reduces the running time around 10~30% compared to when compiled without optimization. Today I found that one of my program's running time is reduced remarkably, about 1/10, with the -O3 option. Without optimization it took about 7 seconds to complete. However the -O3 option makes in run in 0.7 seconds! I have never seen such incredible amount of time reduction.
So here I wonder what types of program patterns are more likely to benefit from the gcc optimization option, or perhaps are there some ways in programming to have optimization done more feasibly.
The 1/10 code is below. It is a simple program that calculates the sum of all primes numbers less than the macro constant MAXX, by using wheel factorization algorithm.
#include <stdio.h>
#include <math.h>
#include <inttypes.h>
#include <time.h>
#define MAXX 5000000
#define PBLEN 92160
#define PBMAX 510510
int main(){
clock_t startT, endT;
startT = clock();
int pArr[7] = {2, 3, 5, 7, 11, 13, 17};
int pBase[PBLEN];
pBase[0] = 1;
int i, j, k, index = 1;
for (i = 19; i <= PBMAX; ++i){
for (j = 0; j < 7; ++j){
if (i % pArr[j] == 0){
goto next1;
}
}
pBase[index] = i;
++index;
next1:;
}
uint64_t sum = 2 + 3 + 5 + 7 + 11 + 13 + 17;
for (i = 1; i < PBLEN; ++i){
for (j = 0; j < 7; ++j){
if (pArr[j] <= (int)sqrt((double)pBase[i]) + 1){
if (pBase[i] % pArr[j] == 0){
goto next2;
}
}
else{
sum += pBase[i];
goto next2;
}
}
for (j = 1; j < PBLEN; ++j){
if (pBase[j] <= (int)sqrt((double)pBase[i]) + 1){
if (pBase[i] % pBase[j] == 0){
goto next2;
}
}
else{
sum += pBase[i];
goto next2;
}
}
next2:;
}
int temp, temp2;
for (i = PBMAX; ; i += PBMAX){
for (j = 0; j < PBLEN; ++j){
temp = i + pBase[j];
if (temp > MAXX){
endT = clock();
printf("%"PRIu64"\n\n", sum);
printf("%.3f\n", (double)(endT - startT) / (double)CLOCKS_PER_SEC);
return 0;
}
for (k = 0; k < 7; ++k){
if (temp % pArr[k] == 0){
goto next3;
}
}
for (k = 1; k < PBLEN; ++k){
if (pBase[k] <= (int)sqrt((double)temp) + 1){
if (temp % pBase[k] == 0){
goto next3;
}
}
else{
sum += temp;
break;
}
}
next3:;
}
}
}

I'm going to guess how this could happen from looking at your code. I'm guessing the thing that takes the longest, by far, is sqrt(). You have tight loops where you're running sqrt() on a value that isn't changing. gcc probably decided to make a single call, save the return, and reuse that. Same results, far less calls to sqrt(), so significantly faster run time. You will probably see the same results manually if you move the calls to sqrt() out of your tight loops and run them less. Run a profiler to be certain though.
So, the short answer - sometimes gcc can fix pretty major issues that you could have fixed yourself, but only if they're small in scope. Usually you need to sit down with a profiler and see what's really taking up your time and how to work around the problem.

Related

C - Assigning random numbers to an array causes code to freeze

I'm trying to assign random numbers between a certain amount and assign them to an array. My current method of doing that involves for loops but when I run the program after compiling, my computer just freezes and there are no errors by gcc so I can't figure out what the issue is.
cl_Zone **SetUp_Zones(){
/*Keep within x = 1-100, y = 1-50 */
/*room_Create(x, y, max_Y, max_X)*/
cl_Zone **array_Zones;
array_Zones = malloc(sizeof(cl_Zone)*9);
int array_Zone_Select[9];
int counter; int counter2; int x;
int max_active_zones; int active_zone;
x = 0;
max_active_zones = rand() % 4 + 6;
/*need something to designate zones as active up to the active_zones amount*/
for (counter = 0; counter < max_active_zones; counter++){
active_zone = rand() % 1 + 9;
/*if first instance, just assign to array*/
if (counter == 0){array_Zone_Select[counter] = active_zone;}
/*else check if active zone already taken in array_zone_select*/
else{
for (counter2 = 0; counter2 < counter; counter2++){
if(array_Zone_Select[counter2] == counter){x++;}
}
}
/*if already taken, redo loop, otherwise assign*/
if(x > 0){
counter--;
x = 0;
}
else{array_Zone_Select[counter] = active_zone;}
};
x = 1;
for (counter = 0; counter < 9; counter+3){
array_Zones[counter] = zone_Create(1,x,16,33);
array_Zones[counter + 1] = zone_Create(34,x,16,33);
array_Zones[counter + 2] = zone_Create(67,x,16,33);
x = x + 16;
}
}
cl_Zone *zone_Create(int x, int y, int max_X, int max_Y){
cl_Zone *p_Zone;
p_Zone = malloc(sizeof(cl_Zone));
p_Zone->Position.x = x;
p_Zone->Position.y = y;
p_Zone->max_Y = max_Y;
p_Zone->max_X = max_X;
return p_Zone;
}
there are no errors by gcc so I can't figure out what the issue is.
Unfortunately, C compilers will accept piles of incorrect code without producing an error. Compiler warnings are very useful and will warn you about a great many problems, but they're generally off by default. Be sure to turn them on.
However, it's not as simple as -Wall to turn on all warnings; nothing in C is simple nor straightforward. You have to come up with a mix of warning flags that suit you. Here's a good start to catch many common mistakes.
gcc -Wall -Wshadow -Wwrite-strings -Wextra -Wconversion -std=c11 -pedantic
You can look up what they mean in the docs and add your own to the mix.
for (counter = 0; counter < 9; counter+3){ is an infinite loop which will suck down as much CPU as your operating system will allow. You probably meant for (counter = 0; counter < 9; counter+=3){. I'm still surprised that would make your computer freeze.
cl_Zone **SetUp_Zones fails to return anything. It should return array_Zones;.
array_Zones = malloc(sizeof(cl_Zone)*9); is incorrectly allocating too much memory. That would be if you're storing 9 cl_Zones. But cl_Zone **array_Zones says you're storing 9 pointers to cl_Zone. Instead you need array_Zones = malloc(sizeof(cl_Zone*) * 9);
active_zone = rand() % 1 + 9; will always produce 9. Anything % 1 is 0, so this is just active_zone = 0 + 9;.

Efficiently print every x iterations in for loop

I am writing a program in which a certain for-loop gets iterated over many many times.
One single iteration doesn't take to long but since the program iterates the loop so often it takes quite some time to compute.
In an effort to get more information on the progress of the program without slowing it down to much I would like to print the progress every xth step.
Is there a different way to do this, than a conditional with a modulo like so:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
?
Thanks is advance
This code:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
can be restructured as:
/* Partition the execution into blocks of x iterations, possibly including a
final fragmentary block. The expression (some_large_number+(x-1))/x
calculates some_large_number/x with any fraction rounded up.
*/
for (int block = 0, i = 0; block < (some_large_number+(x-1))/x; ++block)
{
printf("%f%%\r", percent);
// Set limit to the lesser of the end of the current block or some_large_number.
int limit = (block+1) * x;
if (some_large_number < limit) limit = some_large_number;
// Iterate the original code.
for (; i < limit; ++i)
{
//some other code
}
}
With the following caveats and properties:
The inner loop has no more work than the original loop (it has no extra variable to count or test) and has the i % x == 0 test completely removed. This is optimal for the inner loop in the sense it reduces the nominal amount of work as much as possible, although real-world hardware sometimes has finicky behaviors that can result in more compute time for less actual work.
New identifiers block and limit are introduced but can be changed to avoid any conflicts with uses in the original code.
Other than the above, the inner loop operates identically to the original code: It sees the same values of i in the same order as the original code, so no changes are needed in that code.
some_large_number+(x-1) could overflow int.
I would do it like this:
int j = x;
for (int i = 0; i < some_large_number; i++){
if(--j == 0) {
printf("%f%%\r", percent);
j = x;
}
//some other code
.
.
.
}
Divide the some_large_number by x. Now loop for x times and nest it with the new integer and then print the percent. I meant this:
int temp = some_large_number/x;
for (int i = 0; i < x; i++){
for (int j = 0; j < temp; j++){
//some code
}
printf("%f%%\r", percent);
}
The fastest approach regarding your performance concern would be to use a nested loop:
unsigned int x = 6;
unsigned int segments = some_large_number / x;
unsigned int y;
for ( unsigned int i = 0; i < segments; i++ ) {
printf("%f%%\r", percent);
for ( unsigned int j = 0; j < x; j++ ) {
/* some code here */
}
}
// If some_large_number can´t be divided evenly through `x`:
if (( y = (some_large_number % x)) != 0 )
{
for ( unsigned int i = 0; i < y; i++ ) {
/* same code as inside of the former inner loop. */
}
}
Another example would be to use a different counting variable for the check to execute the print process by comparing that to x - 1 and reset the variable to -1 if it matches:
unsigned int x = 6;
unsigned int some_large_number = 100000000;
for ( unsigned int i = 0, int j = 0; i < some_large_number; i++, j++ ) {
if(j == (x - 1))
{
printf("%f%%\r", percent);
j = -1;
}
/* some code here */
}

OpenMP | Parallelizing a loop with dependent iterations

I am trying to parallelize an encoding function, I tried adding a simple pragma around the for but the result was wrong. I figured that iterations are dependent (by code variable) and therefore they can not be directly parallelized.
int encodePrimeFactorization(int number){
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--){
code = code * 2;
int f = prime_factors[i];
if (number % f == 0){
code = code + 1;
}
}
return code;
}
Is there a way to make code variable independent for each iteration?
Yes. At least for me, it's easier to think about that if you look at the algorithm this way:
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--) {
code = code << 1;
int f = prime_factors[i];
if (number % f == 0){
// The last bit of code is never set here,
// because it has been just shifted to the left
code = code | 1;
}
}
Now you can shift the set-bit already while setting:
int code = 0;
for (int i=PF_NUMBER-1; i>=0 ; i--) {
int f = prime_factors[i];
if (number % f == 0){
code = code | (1 << i);
}
}
Now it becomes a trivial reduction. Now you can shift the set-bit already while setting:
int code = 0;
#pragma omp parallel for reduction(|,code)
for (int i=PF_NUMBER-1; i>=0 ; i--) {
int f = prime_factors[i];
if (number % f == 0){
code |= (1 << i);
}
}
That said, you will not get any performance gain. This works only up to 31 bits, which is far too little work to benefit from parallelization overhead. If this is a hot part in your code, you have to find something around it to apply the parallelization.

C program optimization/speed increase

As a beginner, I made a program which finds the number of prime numbers (prime) which are not higher than an input natural number (x). The program works fine (I think), but I want to make it work faster (for higher numbers). Is it possible, and if yes, how?
#include <stdio.h>
int main() {
int x, i, j, flag = 1, prime = 0 ;
scanf("%d", &x);
for (i = 2; i <= x; i++) {
j = 2;
while (flag == 1 && j < i/2 + 1 ) {
if (i % j == 0) {
flag = 0;
}
j++;
}
if (flag == 1) {
prime++;
}
flag = 1;
}
printf("%d\n", prime);
return 0;
}
Your algorithm does trial division, which is slow. A better algorithm is the 2000-year old Sieve of Eratosthenes:
function primes(n):
sieve := makeArray(2..n, True)
for p from 2 to n
if sieve[p]
output p # prime
for i from p*p to n step p
sieve[i] := False
I'll leave it to you to translate to C. If you want to know more, I modestly recommend the essay Programming with Prime Numbers at my blog.
There is one famous prime-number theorem. It states that
π(n)≈n/ln(n)
But there are certain algorithms for calculating this function. Follow the links:
PNT
Computing π(x): The Meissel, Lehmer, Lagarias, Miller, Odlyzko method

Segmentation Fault during printing, includes large array computation

This program in C is supposed to delete every 666th number in a series of 10^7 natural numbers.
The problem seems to compile fine, even with optimisation. But, while on runtime stops throwing a Segmentation Fault after a few computations. I notice that when it stops its a few hundred thousand natural numbers away from the upper limit of 10^7. I at first tried to solve the problem using dynamic memory allocation with malloc. I received the same output. I tried using static arrays to do the job.
#include <stdio.h>
static unsigned int a[10000001] = {[0 ... 10000000] = 1};
void main(void) {
unsigned int i = 1, last = 0, count = 0, test = 0;
while(i < 100000) {
count = 0; test = 0;
while(count < 665) {
if(a[last + count + test])
count++;
else
test++;
}
last = last + test + count;
if(last < 10000002)
a[last] = 0;
else {
last = last - 10000001;
a[last] = 0;
}
printf(" %u", last);
i++;
}
printf("\n\n");
}
Here's one problem:
if(last < 10000002)
a[last] = 0;
Should be:
if(last < 10000001)
a[last] = 0;
Also, this statement might be an issue if last + count + test is > 10000000:
if(a[last + count + test])

Resources