Why does this code go into infinite loop - c

This function below checks to see if an integer is prime or not.
I'm running a for loop from 3 to 2147483647 (+ve limit of long int).
But this code hangs, can't find out why?
#include<time.h>
#include<stdio.h>
int isPrime1(long t)
{
long i;
if(t==1) return 0;
if(t%2==0) return 0;
for(i=3;i<t/2;i+=2)
{
if(t%i==0) return 0;
}
return 1;
}
int main()
{
long i=0;
time_t s,e;
s = time(NULL);
for(i=3; i<2147483647; i++)
{
isPrime1(i);
}
e = time(NULL);
printf("\n\t Time : %ld secs", e - s );
return 0;
}

It will eventually terminate, but will take a while, if you look at your loops when you inline your isPrime1 function, you have something like:
for(i=3; i<2147483647; i++)
for(j=3;j<i/2;j+=2)
which is roughly n*n/4 = O(n^2). Your loop trip count is way too high.

It depends upon the system and the compiler. On Linux, with GCC 4.7.2 and compiling with gcc -O2 vishaid.c -o vishaid the program returns immediately, and the compiler is optimizing all the call to isPrime1 by removing them (I checked the generated assembler code with gcc -O2 -S -fverbose-asm, then main does not even call isPrime1). And GCC is right: since isPrime1 has no side-effect and its result is not used, its call can be removed. Then the for loop has an empty body, so can also be optimized.
The lesson to learn is that when benchmarking optimized binaries, you better have some real side-effect in your code.
Also, arithmetic tells us that some i is prime if it has no divisors less than its square root. So better code:
int isPrime1(long t) {
long i;
double r = sqrt((double)t);
long m = (long)r;
if(t==1) return 0;
if(t%2==0) return 0;
for(i=3;i <= m;i +=2)
if(t%i==0) return 0;
return 1;
}
On my system (x86-64/Debian/Sid with i7 3770K Intel processor, the core running that program is at 3.5GHz) long-s are 64 bits. So I coded
int main ()
{
long i = 0;
long cnt = 0;
time_t s, e;
s = time (NULL);
for (i = 3; i < 2147483647; i++)
{
if (isPrime1 (i) && (++cnt % 4096) == 0) {
printf ("#%ld: %ld\n", cnt, i);
fflush (NULL);
}
}
e = time (NULL);
printf ("\n\t Time : %ld secs\n", e - s);
return 0;
}
and after about 4 minutes it was still printing a lot of lines, including
#6819840: 119566439
#6823936: 119642749
#6828032: 119719177
#6832128: 119795597
I'm guessing it would need several hours to complete. After 30 minutes it is still spitting (slowly)
#25698304: 486778811
#25702400: 486862511
#25706496: 486944147
#25710592: 487026971
Actually, the program needed 4 hours and 16 minutes to complete. Last outputs are
#105086976: 2147139749
#105091072: 2147227463
#105095168: 2147315671
#105099264: 2147402489
Time : 15387 secs
BTW, this program is still really inefficient: The primes program /usr/games/primes from bsdgames package is answering much quicker
% time /usr/games/primes 1 2147483647 | tail
2147483423
2147483477
2147483489
2147483497
2147483543
2147483549
2147483563
2147483579
2147483587
2147483629
/usr/games/primes 1 2147483647
10.96s user 0.26s system 99% cpu 11.257 total
and it has still printed 105097564 lines (most being skipped by tail)
If you are interested in prime number generation, read several math books (it is still a research subject if you are interested in efficiency; you still can get your PhD on that subject.). Start with the sieve of erasthothenes and primality test pages on Wikipedia.
Most importantly, compile first your program with debugging information and all warnings (i.e. gcc -Wall -g on Linux) and learn to use your debugger (i.e. gdb on Linux). You could then interrupt your debugged program (with Ctrl-C under gdb, then let it continue with the cont command to gdb) after about a minute and two, then observe that the i counter in main is increasing slowly. Perhaps also ask for profiling information (with -pg option to gcc then use gprof). And when coding complex arithmetic things it is well worth to read good math books about them (and primality test is a very complex subject, central to most cryptographic algorithms).

This is a very inefficient approach to test for primes, and that's why it seems to hang.
Search the web for more efficient algorithms, such as the Sieve of Eratosthenes

Here try this, see if it's really an infinite loop
int main()
{
long i=0;
time_t s,e;
s = time(NULL);
for(i=3; i<2147483647; i++)
{
isPrime1(i);
//calculate the time execution for each loop
e = time(NULL);
printf("\n\t Time for loop %d: %ld secs", i, e - s );
}
return 0;
}

Related

Why is iterating through an array backwards faster than forward in C

I'm studying for an exam and am trying to follow this problem:
I have the following C code to do some array initialisation:
int i, n = 61440;
double x[n];
for(i=0; i < n; i++) {
x[i] = 1;
}
But the following runs faster (0.5s difference in 1000 iterations):
int i, n = 61440;
double x[n];
for(i=n-1; i >= 0; i--) {
x[i] = 1;
}
I first thought that it was due to the loop accessing the n variable, thus having to do more reads (as suggested here for example: Why is iterating through an array backwards faster than forwards). But even if I change the n in the first loop to a hard coded value, or vice versa move the 0 in the bottom loop to a variable, the performance remains the same. I also tried to change the loops to only do half the work (go from 0 to < 30720, or from n-1 to >= 30720), to eliminate any special treatment of the 0 value, but the bottom loop is still faster
I assume it is because of some compiler optimisations? But everything I look up for the generated machine code suggests, that < and >= ought to be equal.
Thankful for any hints or advice! Thank you!
Edit: Makefile, for compiler details (this is part of a multi threading exercise, hence the OpenMP, though for this case it's all running on 1 core, without any OpenMP instructions in the code)
#CC = gcc
CC = /opt/rh/devtoolset-2/root/usr/bin/gcc
OMP_FLAG = -fopenmp
CFLAGS = -std=c99 -O2 -c ${OMP_FLAG}
LFLAGS = -lm
.SUFFIXES : .o .c
.c.o:
${CC} ${CFLAGS} -o $# $*.c
sblas:sblas.o
${CC} ${OMP_FLAG} -o $# $#.o ${LFLAGS}
Edit2: I redid the experiment with n * 100, getting the same results:
Forward: ~170s
Backward: ~120s
Similar to the previous values of 1.7s and 1.2s, just times 100
Edit3: Minimal Example - changes described above where all localized to the vector update method. This is the default forward version, which takes longer than the backwards version for(i = limit - 1; i >= 0; i--)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
void vector_update(double a[], double b[], double x[], int limit);
/* SBLAS code */
void *main() {
int n = 1024*60;
int nsteps = 1000;
int k;
double a[n], b[n], x[n];
double vec_update_start;
double vec_update_time = 0;
for(k = 0; k < nsteps; k++) {
// Loop over whole program to get reasonable execution time
// (simulates a time-steping code)
vec_update_start = omp_get_wtime();
vector_update(a, b, x, n);
vec_update_time = vec_update_time + (omp_get_wtime() - vec_update_start);
}
printf( "vector update time = %f seconds \n \n", vec_update_time);
}
void vector_update(double a[], double b[], double x[] ,int limit) {
int i;
for (i = 0; i < limit; i++ ) {
x[i] = 0.0;
a[i] = 3.142;
b[i] = 3.142;
}
}
Edit4: the CPU is AMD quad-core Opteron 8378. The machine uses 4 of those, but I'm using only one on the main processor (core ID 0 in the AMD architecture)
It's not the backward iteration but the comparison with zero which causes the loop in the second case run faster.
for(i=n-1; i >= 0; i--) {
Comparison with zero can be done with a single assembly instruction whereas comparison with any other number takes multiple instructions.
The main reason is that your compiler isn't very good at optimising. In theory there's no reason that a better compiler couldn't have converted both versions of your code into the exact same machine code instead of letting one be slower.
Everything beyond that depends on what the resulting machine code is and what it's running on. This can include differences in RAM and/or CPU speeds, differences in cache behaviour, differences in hardware prefetching (and number of prefetchers), differences in instruction costs and instruction pipelining, differences in speculation, etc. Note that (in theory) this doesn't exclude the possibility that (on most computers but not on your computer) the machine code your compiler generates for forward loop is faster than the machine code it generates for backward loop (your sample size isn't large enough to be statistically significant, unless you're working on embedded systems or game consoles where all computers that run the code are identical).

How can i detect "out of bound error" in C program with GDB?

I wrote this program in C, adding an intentional error on purpose.
The program calculates the sum of 5 numbers entered by the user, and displays the result on the screen.
I compiled it with "gcc -Wall -Wextra -Werror -ansi -pedantic -g" and works fine.
But it has an error.
In the last repetition of the cycle, the program evaluates a[N], which is not defined!
I'd like to know how to spot this kind of error using GDB
When i use "set check range on" i get this messange "warning: the current range check setting does not match the language." and nothing happens...
This is the code to debug:
#define N 5
#include <stdio.h>
void read(float*);
int main(void) {
float a[N], s;
int i;
printf("Enter %d numbers: ", N);
read(a);
i = -1;
s = 0;
while (i != N) {
i = i + 1;
s = s+a[i];
}
printf("The sum is : %.2f \n", s);
return 0;
}
void read(float*a) {
int n = 0;
while (n!=N) {
scanf("%f",&a[n]);
n++;
}
}
I think this is your problem:
while (i != N) {
i = i + 1;
s = s+a[i];
}
N is defined as 5, so when i is 4, the condition is true. i is then incremented to 5, and s += a[i]; is executed. Just use a for loop instead, or use do {} while:
for (i=0;i<N;++i)
s += a[i];
//or
i = 0;
do {
s += a[i];
} while (++i != N);
Either way. Personally, I find the for loop more readable
To answer your question (using gdb):
You've compiled using the -g flag, so run `gdb compiled_file_name
In gdb, set a break-point in the while loop (b <line-nr> [condition])
start the program (run)
use step or next to step through the code
use p i to check the value of i every time you hit the while condition, and every time you use i as offset (a[i])
For more details, docs for gdb are available. It takes some time, but it's well worth it. gdb is an excellent debugger
The answer to this particular error is that the loop increments i and then accesses a at index i, without an intervening check. So when i equals N - 1 when it starts the loop, it's incremented to N and used in the array.
In general, gcc's -fsanitize=bounds option should be helpful for these errors.

Example of very CPU-intensive C function under 5 lines of code

I'm testing a progress indicator in my program, so I'd like to test it against a function that takes a noticeable amount of time (i.e. >5 seconds) to finish.
I tried this:
void doLotsOfWork () {
for (int i = 0; i < 100; i++) {
arc4random()%(int)HUGE;
}
}
But this runs in under 1 second on my i7. Can you suggest even more CPU-intensive operations? The function body should really be no more than 5 lines of code.
a simple loop should do the trick:
/* iterate.c */
#include <stdio.h>
int
main (void)
{
printf("start\n");
volatile unsigned long long i;
for (i = 0; i < 1000000000ULL; ++i);
printf("stop\n");
return 0;
}
this takes ~2.5 seconds on my system. feel free to increase the amount of iterations to match your timing expectations.
$> gcc -o iterate iterate.c
$> time ./iterate
start
stop
real 0m2.763s
user 0m2.758s
sys 0m0.000s
short form:
void
wait (void)
{
volatile unsigned long long i;
for (i = 0; i < 1000000000ULL; ++i);
}
Just pause the thread/process? No need to do actual work.
In POSIX, call e.g. usleep(5000000); to pause for five seconds (see usleep() manual page).

Speed up C program without using conditional compilation

we are working on a model checking tool which executes certain search routines several billion times. We have different search routines which are currently selected using preprocessor directives. This is not only very unhandy as we need to recompile every time we make a different choice, but also makes the code hard to read. It's now time to start a new version and we are evaluating whether we can avoid conditional compilation.
Here is a very artificial example that shows the effect:
/* program_define */
#include <stdio.h>
#include <stdlib.h>
#define skip 10
int main(int argc, char** argv) {
int i, j;
long result = 0;
int limit = atoi(argv[1]);
for (i = 0; i < 10000000; ++i) {
for (j = 0; j < limit; ++j) {
if (i + j % skip == 0) {
continue;
}
result += i + j;
}
}
printf("%lu\n", result);
return 0;
}
Here, the variable skip is an example for a value that influences the behavior of the program. Unfortunately, we need to recompile every time we want a new value of skip.
Let's look at another version of the program:
/* program_variable */
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int i, j;
long result = 0;
int limit = atoi(argv[1]);
int skip = atoi(argv[2]);
for (i = 0; i < 10000000; ++i) {
for (j = 0; j < limit; ++j) {
if (i + j % skip == 0) {
continue;
}
result += i + j;
}
}
printf("%lu\n", result);
return 0;
}
Here, the value for skip is passed as a command line parameter. This adds great flexibility. However, this program is much slower:
$ time ./program_define 1000 10
50004989999950500
real 0m25.973s
user 0m25.937s
sys 0m0.019s
vs.
$ time ./program_variable 1000 10
50004989999950500
real 0m50.829s
user 0m50.738s
sys 0m0.042s
What we are looking for is an efficient way to pass values into a program (by means of a command line parameter or a file input) that will never change afterward. Is there a way to optimize the code (or tell the compiler to) such that it runs more efficiently?
Any help is greatly appreciated!
Comments:
As Dirk wrote in his comment, it is not about the concrete example. What I meant was a way to replace an if that evaluates a variable that is set once and then never changed (say, a command line option) inside a function that is called literally billions of times by a more efficient construct. We currently use the preprocessor to tailor the desired version of the function. It would be nice if there is a nicer way that does not require recompilation.
You can take a look at libdivide which works to do fast division when the divisor isn't known until runtime: (libdivide is an open source library
for optimizing integer division).
If you calculate a % b using a - b * (a / b) (but with libdivide) you might find that it's faster.
I ran your program_variable code on my system to get a baseline of performance:
$ gcc -Wall test1.c
$ time ./a.out 1000 10
50004989999950500
real 0m55.531s
user 0m55.484s
sys 0m0.033s
If I compile test1.c with -O3, then I get:
$ time ./a.out 1000 10
50004989999950500
real 0m54.305s
user 0m54.246s
sys 0m0.030s
In a third test, I manually set the values of limit and skip:
int limit = 1000, skip = 10;
I then re-run the test:
$ gcc -Wall test2.c
$ time ./a.out
50004989999950500
real 0m54.312s
user 0m54.282s
sys 0m0.019s
Taking out the atoi() calls doesn't make much of a difference. But if I compile with -O3 optimizations turned on, then I get a speed bump:
$ gcc -Wall -O3 test2.c
$ time ./a.out
50004989999950500
real 0m26.756s
user 0m26.724s
sys 0m0.020s
Adding a #define macro for an ersatz atoi() function helped a little, but didn't do much:
#define QSaToi(iLen, zString, iOut) {int j = 1; iOut = 0; \
for (int i = iLen - 1; i >= 0; --i) \
{ iOut += ((zString[i] - 48) * j); \
j = j*10;}}
...
int limit, skip;
QSaToi(4, argv[1], limit);
QSaToi(2, argv[2], skip);
And testing:
$ gcc -Wall -O3 -std=gnu99 test3.c
$ time ./a.out 1000 10
50004989999950500
real 0m53.514s
user 0m53.473s
sys 0m0.025s
The expensive part seems to be those atoi() calls, if that's the only difference between -O3 compilation.
Perhaps you could write one binary, which loops through tests of various values of limit and skip, something like:
#define NUM_LIMITS 3
#define NUM_SKIPS 2
...
int limits[NUM_LIMITS] = {100, 1000, 1000};
int skips[NUM_SKIPS] = {1, 10};
int limit, skip;
...
for (int limitIdx = 0; limitIdx < NUM_LIMITS; limitIdx++)
for (int skipIdx = 0; skipIdx < NUM_SKIPS; skipIdx++)
/* per-limit, per-skip test */
If you know your parameters ahead of compilation time, perhaps you can do it this way. You could use fprintf() to write your output to a per-limit, per-skip file output, if you want results in separate files.
You could try using the GCC likely/unlikely builtins (e.g. here) or profile guided optimization (e.g. here). Also, do you intend (i + j) % 10 or i + (j % 10)? The % operator has higher precedence, so your code as written is testing the latter.
I'm a bit familiar with the program Niels is asking about.
There are a bunch of interesting answers around (thanks), but the answers slightly miss the spirit of the question. The given example programs are really just example programs. The logic that is subject to pre-processor statements is much much more involved. In the end, it is not just about executing a modulo operation or a simple division. it is about keeping or skipping certain procedure calls, executing an operation between two other operations etc, defining the size of an array, etc.
All these things could be guarded by variables that are set by command-line parameters. But that would be too costly as many of these routines, statements, memory allocations are executed a billion times. Perhaps that shapes the problem a bit better. Still very interested in your ideas.
Dirk
If you would use C++ instead of C you could use templates so that things can be calculated at compile time, even recursions are possible.
Please have a look at C++ template meta programming.
A stupid answer, but you could pass the define on the gcc command line and run the whole thing with a shell script that recompiles and runs the program based on a command-line parameter
#!/bin/sh
skip=$1
out=program_skip$skip
if [ ! -x $out ]; then
gcc -O3 -Dskip=$skip -o $out test.c
fi
time $out 1000
I got also an about 2× slowdown between program_define and program_variable, 26.2s vs. 49.0s. I then tried
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int i, j, r;
long result = 0;
int limit = atoi(argv[1]);
int skip = atoi(argv[2]);
for (i = 0; i < 10000000; ++i) {
for (j = 0, r = 0; j < limit; ++j, ++r) {
if (r == skip) r = 0;
if (i + r == 0) {
continue;
}
result += i + j;
}
}
printf("%lu\n", result);
return 0;
}
using an extra variable to avoid the costly division, and the resulting time was 18.9s, so significantly better than the modulo with a statically known constant. However, this auxiliary-variable technique is only promising if the change is easily predictable.
Another possibility would be to eliminate using the modulus operator:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int i, j;
long result = 0;
int limit = atoi(argv[1]);
int skip = atoi(argv[2]);
int current = 0;
for (i = 0; i < 10000000; ++i) {
for (j = 0; j < limit; ++j) {
if (++current == skip) {
current = 0;
continue;
}
result += i + j;
}
}
printf("%lu\n", result);
return 0;
}
If that is the actual code, you have a few ways to optimize it:
(i + j % 10==0) is only true when i==0, so you can skip that entire mod operation when i>0. Also, since i + j only increases by 1 on each loop, you can hoist the mod out and simply have a variable you increment and reset when it hits skip (as has been pointed out in other answers).
You can also have all possible function implementations already in the program, and at runtime you change the function pointer to select the function which you are actually are using.
You can use macros to avoid that you have to write duplicate code:
#define MYFUNCMACRO(name, myvar) void name##doit(){/* time consuming code using myvar */}
MYFUNCMACRO(TEN,10)
MYFUNCMACRO(TWENTY,20)
MYFUNCMACRO(FOURTY,40)
MYFUNCMACRO(FIFTY,50)
If you need to have too many of these macros (hundreds?) you can write a codegenerator which writes the cpp file automatically for a range of values.
I didn't compile nor test the code, but maybe you see the principle.
You might be compiling without optimisation, which will lead your program to load skip each time it's checked, instead of the literal of 10. Try adding -O2 to your compiler's command line, and/or use
register int skip;

Seg Fault when initializing array

I'm taking a class on C, and running into a segmentation fault. From what I understand, seg faults are supposed to occur when you're accessing memory that hasn't been allocated, or otherwise outside the bounds. 'Course all I'm trying to do is initialize an array (though rather large at that)
Am I simply misunderstanding how to parse a 2d array? Misplacing a bound is exactly what would cause a seg fault-- am I wrong in using a nested for-loop for this?
The professor provided the clock functions, so I'm hoping that's not the problem. I'm running this code in Cygwin, could that be the problem? Source code follows. Using c99 standard as well.
To be perfectly clear: I am looking for help understanding (and eventually fixing) the reason my code produces a seg fault.
#include <stdio.h>
#include <time.h>
int main(void){
//first define the array and two doubles to count elapsed seconds.
double rowMajor, colMajor;
rowMajor = colMajor = 0;
int majorArray [1000][1000] = {};
clock_t start, end;
//set it up to perform the test 100 times.
for(int k = 0; k<10; k++)
{
start=clock();
//first we do row major
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j<1000; j++)
{
majorArray[i][j] = 314;
}
}
end=clock();
rowMajor+= (end-start)/(double)CLOCKS_PER_SEC;
//at this point, we've only done rowMajor, so elapsed = rowMajor
start=clock();
//now we do column major
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j<1000; j++)
{
majorArray[j][i] = 314;
}
}
end=clock();
colMajor += (end-start)/(double)CLOCKS_PER_SEC;
}
//now that we've done the calculations 100 times, we can compare the values.
printf("Row major took %f seconds\n", rowMajor);
printf("Column major took %f seconds\n", colMajor);
if(rowMajor<colMajor)
{
printf("Row major is faster\n");
}
else
{
printf("Column major is faster\n");
}
return 0;
}
Your program works correctly on my computer (x86-64/Linux) so I suspect you're running into a system-specific limit on the size of the call stack. I don't know how much stack you get on Cygwin, but your array is 4,000,000 bytes (with 32-bit int) - that could easily be too big.
Try moving the declaration of majorArray out of main (put it right after the #includes) -- then it will be a global variable, which comes from a different allocation pool that can be much bigger.
By the way, this comparison is backwards:
if(rowMajor>colMajor)
{
printf("Row major is faster\n");
}
else
{
printf("Column major is faster\n");
}
Also, to do a test like this you really ought to repeat the process for many different array sizes and shapes.
You are trying to grab 1000 * 1000 * sizeof( int ) bytes on the stack. This is more then your OS allows for the stack growth. If on any Unix - check the ulimit -a for max stack size of the process.
As a rule of thumb - allocate big structures on the heap with malloc(3). Or use static arrays - outside of scope of any function.
In this case, you can replace the declaration of majorArray with:
int (*majorArray)[1000] = calloc(1000, sizeof majorArray);
I was unable to find any error in your code, so I compiled it and run it and worked as expected.
You have, however, a semantic error in your code:
start=clock();
//set it up to perform the test 100 times.
for(int k = 0; k<10; k++)
{
Should be:
//set it up to perform the test 100 times.
for(int k = 0; k<10; k++)
{
start=clock();
Also, the condition at the end should be changed to its inverse:
if(rowMajor<colMajor)
Finally, to avoid the problem of the os-specific stack size others mentioned, you should define your matrix outside main():
#include <stdio.h>
#include <time.h>
int majorArray [1000][1000];
int main(void){
//first define the array and two doubles to count elapsed seconds.
double rowMajor, colMajor;
rowMajor = colMajor = 0;
This code runs fine for me under Linux and I can't see anything obviously wrong about it. You can try to debug it via gdb. Compile it like this:
gcc -g -o testcode test.c
and then say
gdb ./testcode
and in gdb say run
If it crashes, say where and gdb tells you, where the crash occurred. Then you now in which line the error is.
The program is working perfectly when compiled by gcc, & run in Linux, Cygwin may very well be your problem here.
If it runs correctly elsewhere, you're most likely trying to grab more stack space than the OS allows. You're allocating 4MB on the stack (1 mill integers), which is way too much for allocating "safely" on the stack. malloc() and free() are your best bets here.

Resources