Execution speed of loops varies with variable position - c

Version A:
#include<time.h>
#include<stdio.h>
int main()
{
time_t start = time(0); //denote start time
int i,j; // initialize ints
static double dst[4096][4096]; //initialize arrays
static double src[4096][4096]; //
for(i=0; i<4096; ++i){
for(j=0; j<4096; ++j){
dst[i][j] = src[i][j];
}
}
time_t end = time(0); //denote end time
double time = difftime(end, start); //take difference of start and end time to determine elapsed time
printf("Test One: %fms\n",time);
}
Version B:
#include<time.h>
#include<stdio.h>
int main()
{
time_t start = time(0); //denote start time
int i,j; // initialize ints
static double dst[4096][4096]; //initialize arrays
static double src[4096][4096]; //
for(i=0; i<4096; ++i){
for(j=0; j<4096; ++j){
dst[j][i] = src[j][i];
}
}
time_t end = time(0); //denote end time
double time = difftime(end, start); //take difference of start and end time to determine elapsed time
printf("Test One: %fms\n",time);
}
Using this program, I have determined that if you reverse the positions of i and j in the arrays, it takes 1 second longer to execute.
Why is this happening?

In your code, the loop means that "traverse the address in the same row, one by one, then go to next line". But if you reverse the positions of i and j, this means that "traverse the address in the same column, one by one, the go to next column".
In C, multi-dimensional array are put on linear address space, byte by byte, then line by line, so dst[i][j] = src[i][j] in your case means *(dst + 4096 * i + j) = *(src + 4096 * i + j):
*(dst + 4096 * 0 + 0) = *(src + 4096 * 0 + 0);
*(dst + 4096 * 0 + 1) = *(src + 4096 * 0 + 1);
*(dst + 4096 * 0 + 2) = *(src + 4096 * 0 + 2);
//...
while reversed i and j means:
*(dst + 4096 * 0 + 0) = *(src + 4096 * 0 + 0);
*(dst + 4096 * 1 + 0) = *(src + 4096 * 1 + 0);
*(dst + 4096 * 2 + 0) = *(src + 4096 * 2 + 0);
//...
So the extra 1 second in second case is cause by accessing memory in a non-contigous manner.
You don't need to do time calculation yourself, because you can run your program with "time" command on linux/UNIX:
$ time ./loop
The results on my linux box for the 2 cases:
$ time ./loop_i_j
real 0m0.244s
user 0m0.062s
sys 0m0.180s
$ time ./loop_j_i
real 0m1.072s
user 0m0.995s
sys 0m0.073s

#include<time.h>
#include<stdio.h>
int main()
{
time_t start = time(0); //denote start time
int i,j; // initialize ints
static double dst[4096][4096]; //initialize arrays
static double src[4096][4096]; //
for(j=0; j<4096; ++j){
for(i=0; i<4096; ++i){
dst[j][i] = src[j][i];
}
}
time_t end = time(0); //denote end time
double time = difftime(end, start); //take difference of start and end time to determine elapsed time
printf("Test One: %fms\n",time);
}
I tested and it is giving me this o/p Test One: 0.000000ms in both cases after reversing and normal. I used gcc compiler.
Maybe the issue is that you have not included stdio.h .I experienced the same behavior once when I did not include stdio.h.
Something related to memory(in stack) allocation during compile time could be possible reason.

Related

Why is the program getting stuck after 109 elements in array?

This is the code for Quick Sort. The array generated is random, using random() function, with 10,000 as upper limit.
When number of elements exceeded 109, e.g. 110, the program did not complete execution and got stuck.
This is the code:
/*
Program to sort a list of numbers using Quick sort algorithm.
*/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// For runtime calculation
#define BILLION 1000000000
// For random number upper limit
#define UPPER_LIMIT 10000
// For printing array
#define PRINT_ARR printf("Parse %d: ", parseCount); for (int p = 0; p < eltCount; p++) printf("%d ", *(ptrMainArr + p)); printf("\n"); parseCount++;
// Declare global parse counter
int parseCount = 0;
// Declare global pointer to array
int *ptrMainArr;
// Number of elements in array
int eltCount;
float calcRunTime(struct timespec start, struct timespec end) {
long double runTime;
if (end.tv_nsec - start.tv_nsec >= 0) {
runTime = (end.tv_sec - start.tv_sec) + ((float)(end.tv_nsec - start.tv_nsec) / BILLION);
}
else {
runTime = (end.tv_sec - start.tv_sec - 1 + ((float)(end.tv_nsec - start.tv_nsec) / BILLION));
}
return runTime;
}
void swap(int *ptr1, int *ptr2) {
int temp = *ptr1;
*ptr1 = *ptr2;
*ptr2 = temp;
}
void quicksort(int *ptrArr, int numOfElts) {
// Single element in sub-array
if (numOfElts == 1) {
return;
}
// No elements in sub-array
if (numOfElts == 0) {
return;
}
// Print elements in array
PRINT_ARR
// Select pivot element (element in middle)
int pivotIdx;
// Even number of elements in array
if ((numOfElts) % 2 == 0) {
pivotIdx = ((numOfElts) / 2) - 1;
}
// Odd number of elements in array
else {
pivotIdx = (int)((numOfElts) / 2);
}
int pivot = *(ptrArr + pivotIdx);
// Initialise left and right bounds
int lb = 0, rb = numOfElts - 2;
// Swap pivot element with last element
swap(ptrArr + pivotIdx, ptrArr + numOfElts - 1);
while (1) {
while (*(ptrArr + lb) < pivot) {
lb++;
}
while (*(ptrArr + rb) > pivot && lb <= rb) {
rb--;
}
if (lb > rb) {
break;
}
swap(ptrArr + lb, ptrArr + rb);
}
swap(ptrArr + lb, ptrArr + (numOfElts - 1));
// Sort left sub-array
quicksort(ptrArr, lb);
// Sort right sub-array
quicksort(ptrArr + (lb + 1), numOfElts - lb - 1);
}
int main() {
printf("*** Quick Sort *** \n");
printf("Enter number of elements: ");
scanf("%d", &eltCount);
int arr[eltCount];
for (int i = 0; i < eltCount; i++) {
arr[i] = random() % UPPER_LIMIT + 1;
}
// Assign array to global pointer variable (to print array after each parse)
ptrMainArr = arr;
// Note: arr -> Pointer to array's first element
// Start clock
struct timespec start, end;
clock_gettime(CLOCK_REALTIME, &start);
// Sort array using quicksort
quicksort(arr, eltCount);
// End clock
clock_gettime(CLOCK_REALTIME, &end);
printf("Quick sort time taken is %f s.\n", calcRunTime(start, end));
return 0;
}
I ran this code for values under 110, and the code worked. Included is a Macro Function 'PRINT_ARR' to print the array after every parse.
I want to know the cause for the error, and how to sort an array of size > 10,000.
The partitioning function fails for any couple of equal numbers. It can be fixed just by adding an =. Moreover after swapping you can increase lb and decrease rb:
while (1) {
while (*(ptrArr + lb) < pivot) {
lb++;
}
while (*(ptrArr + rb) > pivot && lb <= rb) {
rb--;
}
if (lb >= rb) { // <--------------------------------------------------
break;
}
swap(ptrArr + lb++, ptrArr + rb--); // <------------------------------
}
Then additional suggestions:
remove the console input and use a fixed number for testing.
your calcRunTime function is broken. Just use your first equation:
long double calcRunTime(struct timespec start, struct timespec end) {
return (end.tv_sec - start.tv_sec) + ((float)(end.tv_nsec - start.tv_nsec) / BILLION);
}
don't use a VLA. malloc it, so you can grow without the risk of a stack overflow.
First I reproduced the issue by compiling your program and running it with 109 (ok) and 110 (doesn't terminate). Then I hard-coded eltCount to 110 so the program no longer requires interactive input. Removed irrelevant functionality. Modified the PRINT_ARR to take an array and len. Printed array before it's sorted paying particular attention to the last element:
#define PRINT_ARR(arr, len) for (unsigned i = 0; i < (len); i++) printf("%d ", arr[i]); printf("\n")
int main() {
unsigned eltCount = 110;
int arr[eltCount];
for (unsigned i = 0; i < eltCount; i++) {
arr[i] = random() % UPPER_LIMIT + 1;
}
PRINT_ARR(arr, eltCount);
quicksort(arr, eltCount);
PRINT_ARR(arr, eltCount);
return 0;
}
and found:
9384 887 2778 6916 7794 8336 5387 493 6650 1422 2363 28 8691 60 7764 3927 541 3427 9173 5737 5212 5369 2568 6430 5783 1531 2863 5124 4068 3136 3930 9803 4023 3059 3070 8168 1394 8457 5012 8043 6230 7374 4422 4920 3785 8538 5199 4325 8316 4371 6414 3527 6092 8981 9957 1874 6863 9171 6997 7282 2306 926 7085 6328 337 6506 847 1730 1314 5858 6125 3896 9583 546 8815 3368 5435 365 4044 3751 1088 6809 7277 7179 5789 3585 5404 2652 2755 2400 9933 5061 9677 3369 7740 13 6227 8587 8095 7540 796 571 1435 379 7468 6602 98 2903 3318 493
Nothing particular special about the last number other than it's a duplicate as #RetiredNinja noted above.
As the problem is an infinite loop, I looked the loops and particular their exit conditions. From our hint from the above, I changed it to only exist if two bounds are equal:
if(lb >= rb) break;
and the program now terminates.

problem with the output in my C program (gives unexpected output)

What is the problem with this program it is supposed to calculate the elapsed time of each function call but to my surprise, the elapsed time is always ZERO because the begin and end are exactly the same. Does anyone have an explanation for this?
This is the output I get:
TIMING TEST: 10000000 calls to rand()
2113 6249 23817 12054 7060 9945 26819
13831 6820 14149 13035 30858 13924 26467
4268 11314 28400 5239 4496 27757 21452
10878 25064 9049 6508 29612 11373 29913
10234 31769 16167 24553 1875 23992 30606
2606 19539 2184 14832 27089 27474 23310
, .. , ,
End time: 1610034404
Begin time: 1610034404
Elapsed time: 0
Time for each call:,10f
Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define NCALLS 10000000
#define NCOLS 7
#define NLINES 7
int main(void) {
int i, val;
long begin, diff, end;
begin = time(NULL);
srand(time(NULL));
printf("\nTIMING TEST: %d calls to rand()\n\n", NCALLS);
for (i = 1; i <= NCALLS; ++i) {
val = rand();
if (i <= NCOLS * NLINES) {
printf("%7d", val);
if (i % NCOLS == 0)
putchar('\n');
} else
if (i == NCOLS * NLINES + 1)
printf("%7s\n\n", ", .. , ,");
}
end = time(NULL);
diff = end - begin;
printf("%s%ld\n%s%ld\n%s%ld\n%s%,10f\n\n",
"End time: ", end,
"Begin time: ", begin,
"Elapsed time: ", diff,
"Time for each call:", (double)diff / NCALLS);
return 0;
}
instead of time(NULL) you can use clock()
time_t t1 = clock();
// your code
time_t t2 = clock();
printf("%f", (double)(t2 - t1) / CLOCKS_PER_SEC); // you have to divide it to CLOCKS_PER_SEC (1 000 000) if you want time in seconds
time() measures in seconds, so if your program doesn't take 1 second you won't see difference
Someone in stackoverflow has already answered difference between them time() vs clock()
Changing your code just to spend some random time inside the loop using a system call a few times, using
struct stat file_stat;
for( int j = 0; j < rand()%(1000); j+=1) stat(".", &file_stat);
and we get on a very old machine (for 10,000 and not 10,000,000 cycles as in your code)
toninho#DSK-2009:/mnt/c/Users/toninho/projects/um$ gcc -std=c17 -Wall tlim.c
toninho#DSK-2009:/mnt/c/Users/toninho/projects/um$ ./a.out
TIMING TEST: 10000 calls to rand()
953019096 822572575 552766679 1101222688 890440097
348966778 1483436091 1936203136 1060888701 936990601
524198868 554412390 1109472424 51262334 723194231
353522463 1808580291 673860068 818332399 350403991
442567054 1054917195 229398907 420744931 620127925
1975661852 812007818 1400791797 1471940068 1739247840
1364643097 529639947 1569398779 20035674 92849903
1060567289 1126157009 2111376669 324165122 338724259
719809477 977786583 510114270 981390269 2029486195
1551025212 1112929616 2091082251 1066603801 1722106156
, .. , ,
End time: 1610044947
Begin time: 1610044942
Elapsed time: 5
Time for each call:500.000000
Using
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#define NCALLS 10 * 1000
#define NCOLS 5
#define NLINES 10
int main(void)
{
int i, val;
long begin, diff, end;
begin = time(NULL);
srand(210701);
printf("\nTIMING TEST: %d calls to rand()\n\n", NCALLS);
for (i = 1; i <= NCALLS; ++i)
{
val = rand();
// spend some time
struct stat file_stat;
for( int j = 0; j < rand()%(1000); j+=1) stat(".", &file_stat);
if (i <= NCOLS * NLINES)
{
printf("%12d", val);
if (i % NCOLS == 0)
putchar('\n');
}
else if (i == NCOLS * NLINES + 1)
printf("%7s\n\n", ", .. , ,");
//printf("%7d: ", i);
//fgetc(stdin);
}; // for
end = time(NULL);
diff = end - begin;
printf("%s%ld\n%s%ld\n%s%ld\n%s%10f\n\n", // %
"End time: ", end,
"Begin time: ", begin,
"Elapsed time: ", diff,
"Time for each call:", (double)diff / NCALLS);
return 0;
}

Split C array into n equal parts

I am trying to split an array into n equal parts by calculating start and end indices. The address of the start and end elements will be passed into a function that will sort these arrays. For example, if arraySize = 1000, and n=2, the indices will be 0, 499, 999. So far I have the below code but for odd n, it is splitting it into more than n arrays. Another way I thought of doing this is by running through the loop n times, but I'm not sure where to start.
int chunkSize = arraySize / numThreads;
for (int start = 0; start < arraySize; start += chunkSize) {
int end = start + chunkSize - 1;
if (end > arraySize - 1) {
end = arraySize - 1;
}
InsertionSort(&array[start], end - start + 1);
}
EDIT: Here's something else I came up with. It seems to be working, but I need to do some more thorough testing. I've drawn this out multiple times and traced it by hand. Hopefully, there aren't any edge cases that will fail. I am already restricting n >= arraySize.
int chunkSize = arraySize / numThreads;
for (int i = 0; i < numThreads; i++) {
int start = i * chunkSize;
int end = start + chunkSize - 1;
if (i == numThreads - 1) {
end = arraySize - 1;
}
for (int i = start; i <= end; i++) {
printf("%d ", array[i]);
}
printf("\n");
}
Calculate the minimum chunk size with the truncating division. Then calculate the remainder. Distribute this remainder by adding 1 to some chunks:
Pseudo-code:
chunk_size = array_size / N
bonus = array_size - chunk_size * N // i.e. remainder
for (start = 0, end = chunk_size;
start < array_size;
start = end, end = start + chunk_size)
{
if (bonus) {
end++;
bonus--;
}
/* do something with array slice over [start, end) interval */
}
For instance if array_size is 11 and N == 4, 11/N yields 2. The remainder ("bonus") is 3: 11 - 2*3. Thus the first three iterations of the loop will add 1 to the size: 3 3 3. The bonus then hits zero and the last chunk size will just be 2.
What we are doing here is nothing more than distributing an error term in a discrete quantization, in a way that is satisfactory somehow. This is exactly what happens when a line segment is drawn on a raster display with the Bresenham algorithm, or when an image is reduced to a smaller number of colors using Floyd-Steinberg dithering, et cetera.
You need to calculate your chunk size so that it is "rounded up", not down. You could do it using % operator and a more complex formula, but just using simple if is probably easier to understand:
int chunkSize = arraySize / numThreads;
if (chunkSize * numThreads < arraySize) {
// In case arraySize is not exactly divisible by numThreads,
// we now end up with one extra smaller chunk at the end.
// Fix this by increseing chunkSize by one byte,
// so we'll end up with numThread chunks and smaller last chunk.
++chunkSize;
}
I hope this would help:
int chunkSize = arraySize / numThreads;
for (int i = 0; i < numThreads-1; i++) {
start = i* chunkSize;
end = start + chunkSize - 1;
InsertionSort(&array[start], end + 1);
}
//Last chunk with all the remaining content
start = end + 1;
end = arraySize - 1;
InsertionSort(&array[start], end + 1);

Using time() and clock() in C Problems

I'm working on a programming assignment and I'm getting strange results.
The idea is to calculate the number of processor ticks and time taken to run the algorithm.
Usually the code runs so quickly that the time taken is 0 sec, but I noticed that the number of processor ticks was 0 at the start and at the finish, resulting in 0 processor ticks taken.
I added a delay using usleep so that the time taken was non-zero, but the processor ticks is still zero and the calculation between the time stamps is still zero.
I've been banging my head on this for several days now and can't get past this problem, any suggestions are extremely welcome.
My code is below:
/* This program takes an input "n". If n is even it divides n by 2
* If n is odd, it multiples n by 3 and adds 1. Each time through the loop
* it iterates a counter.
* It continues until n is 1
*
* This program will compute the time taken to perform the above algorithm
*/
#include <stdio.h>
#include <time.h>
void delay(int);
int main(void) {
int n, i = 0;
time_t start, finish, duration;
clock_t startTicks, finishTicks, diffTicks;
printf("Clocks per sec = %d\n", CLOCKS_PER_SEC);
printf("Enter an integer: ");
scanf("%d", &n); // read value from keyboard
time(&start); // record start time in ticks
startTicks = clock();
printf("Start Clock = %s\n", ctime(&start));
printf("Start Processor Ticks = %d\n", startTicks);
while (n != 1) { // continues until n=1
i++; // increment counter
printf("iterations =%d\t", i); // display counter iterations
if (n % 2) { // if n is odd, n=3n+1
printf("Input n is odd!\t\t");
n = (n * 3) + 1;
printf("Output n = %d\n", n);
delay(1000000);
} else { //if n is even, n=n/2
printf("Input n is even!\t");
n = n / 2;
printf("Output n = %d\n", n);
delay(1000000);
}
}
printf("n=%d\n", n);
time(&finish); // record finish time in ticks
finishTicks = clock();
printf("Stop time = %s\n", ctime(&finish));
printf("Stop Processor Ticks = %d\n", finishTicks);
duration = difftime(finish, start); // compute difference in time
diffTicks = finishTicks - startTicks;
printf("Time elapsed = %2.4f seconds\n", duration);
printf("Processor ticks elapsed = %d\n", diffTicks);
return (n);
}
void delay(int us) {
usleep(us);
}
EDIT: So after researching further, I discovered that usleep() won't affect the program running time, so I wrote a delay function in asm. Now I am getting a value for processor ticks, but I am still getting zero sec taken to run the algorithm.
void delay(int us) {
for (int i = 0; i < us; i++) {
__asm__("nop");
}
}
You can calculate the elapsed time using the below formula.
double timeDiff = (double)(EndTime - StartTime) / CLOCKS_PER_SEC.
Here is the dummy code.
void CalculateTime(clock_t startTime, clock_t endTime)
{
clock_t diffTime = endTime - startTime;
printf("Processor time elapsed = %lf\n", (double)diffTime /CLOCKS_PER_SEC);
}
Hope this helps.
You are trying to time an implementation of Goldbach's Conjecture. I don't see how you can hope to get a meaningful execution time when it contains delays. Another problem is the granularity of clock() results, as shown by the value of CLOCKS_PER_SEC.
It is even more difficult trying to use time() which has a resolution of 1 second.
The way to do it is to compute a large number of values. This prints only 10 of them, to ensure the calculations are not optimised out, but not to distort the calculation time too much.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define SAMPLES 100000
int main(void) {
int i, j, n;
double duration;
clock_t startTicks = clock();
for(j=2; j<SAMPLES; j++) {
n = j; // starting number
i = 0; // iterations
while(n != 1) {
if (n % 2){ // if n is odd, n=3n+1
n = n * 3 + 1;
}
else { // if n is even, n=n/2
n = n / 2;
}
i++;
}
if(j % (SAMPLES/10) == 0) // print 10 results only
printf ("%d had %d iterations\n", j, i);
}
duration = ((double)clock() - startTicks) / CLOCKS_PER_SEC;
printf("\nDuration: %f seconds\n", duration);
return 0;
}
Program output:
10000 had 29 iterations
20000 had 30 iterations
30000 had 178 iterations
40000 had 31 iterations
50000 had 127 iterations
60000 had 179 iterations
70000 had 81 iterations
80000 had 32 iterations
90000 had 164 iterations
Duration: 0.090000 seconds

Splitting an arbitrary range of numbers into rougly equal size partitions in C using OpenMP

I would like to split a range of numbers into roughly equal size in C using OpenMP. For example, if I have a range from 7 through 24 and number of threads is 8. I would like the first thread to begin from 7 and end at 9. Second thread begin from 10 and end at 12. Third thread begin from 13 and end at 14. Fourth thread begin at 15 and end at 16 and so forth... Until the last thread begins at 23 and ends at 24. The code that I wrote is as follows, but it is not getting the previously explained results. I am wondering if there is something that I missed that I can do or is there a more efficient way of doing this? Your help is greatly appreciated.
Note of predefined declaration of variables based on the example given above:
first = 7
last = 24
size = 2 (which signifies the amount of numbers per thread)
r = 2 (r signifies remainder)
nthreads = 8
myid = is the thread ID in a range of 0 to 7
if (r > 0)
{
if (myid == 0)
{
start = first + myid*size;
end = start + size;
}
else if (myid == nthreads - 1)
{
start = first + myid*size + myid;
end = last;
}
else
{
start = first + myid*size + myid;
end = start + size;
}
}
else
{
start = first + myid*size;
if (myid == nthreads - 1) end = last;
else end = start + size - 1;
}
As far as I remember, #pragma omp parallel for automatically divides work between threads in equal chunks, and is ok in most situations.
However, if you want to do this manually, here is a piece of code which does what you want:
int len = last - first + 1;
int chunk = len / nthreads;
int r = len % nthreads;
if (myid < r) {
start = first + (chunk + 1) * myid;
end = start + chunk;
} else {
start = first + (chunk + 1) * r + chunk * (myid - r);
end = start + chunk - 1;
}
If there are no additional restrictions, such distribution is indeed optimal.
// assuming half-open interval
int n = ((end-begin) + omp_get_num_threads() - 1)/omp_get_num_threads();
int first = begin + n*omp_get_thread_num();
int last = max(first + n, end);

Resources