Finding time taken by the code

Finding time taken by the code - c

I am trying to find the time taken by memmove function in c using time.h library. However, When i execute the code, I get the value as zero. Any possible solution to find the time taken by the memmove function?
void main(){
uint64_t start,end;
uint8_t a,b;
char source[5000];
char dest[5000];
uint64_t j=0;
for(j=0;j<5000;j++){
source[j]=j;
}
start=clock();
memmove(dest,source,5000);
end=clock();
printf("%f",((double)end-start));
}

As I write in my comment, memmoving 5000 bytes is far too fast to be mesurable with clock. If you do your memmove 100000 times, then it will get mesurable.
This code below gives an output of 12 on my computer. But this is platform dependent, the number you get on your computer might be quite different.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
int main(void) {
uint64_t start, end;
char source[5000];
char dest[5000];
uint64_t j = 0;
for (j = 0; j < 5000; j++) {
source[j] = j;
}
start = clock();
for (int i = 0; i < 100000; i++)
{
memmove(dest, source, 5000);
}
end = clock();
printf("%lld", (end - start)); // no need to convert to double, (end - start)
// is an uint64_t.
}

If you want to know the time it takes on a beagle bone or another device with a GIPO you can toggle a GPIO before and after your routine. You will have to hook up an oscilloscope or something similar that can sample voltage quickly.
I don't know much about beagle bones but it seems the library libpruio allows fast gpio toggling.
Also, what is your exact goal here? Compare speed on different hardware? As someone suggests, you could increase the number of loops so it becomes more easily measurable with time.h.

Related

How do I run a timer in C

I am making a trivia game and want to make a countdown timer where the person playing only has a certain time to answer the question. I am fairly new to C and am looking for a basic way to set a timer.

If you don't mind the person playing answering and being told they are too late, you could use something like this.
time.h gives us the ability to track processor clocks to time. It also gives us some nifty functions like double difftime(time_t timer2, time_t timer1) which returns the difference between two timers in seconds.
#include <stdio.h>
#include <time.h>
int main(void) {
time_t start_time;
time_t current_time;
time(&start_time);
time(&current_time);
double delay = 5;
int answer = 0;
double diff = 0;
while (diff < delay) {
diff = difftime(current_time, start_time);
scanf("%d", &answer);
printf("Nope not %d\n", answer);
time(&current_time);
}
printf("Too late\n");
return 0;
}
The only problem is scanf will lock the program and a reply will haft to be given before the loop stops. If this isn't what you are looking for, then you should look into using threads. Which is OS dependent.

Performance penalty on misaligned data

As a CS student I'm trying to understand the very basics of a computer. As I stumbled across this website, I wanted to test those performance penalties on my own. I understand what he's talking about and why this happens / should happen.
Anyway, here's my code which I used to call those functions he wrote:
int main(void)
{
int i = 0;
uint8_t alignment = 0;
uint8_t size = 1024 * 1024 * 10; // 10MiB
uint8_t* block = malloc(size);
for(alignment = 0; alignment <= 17; alignment++)
{
start_t = clock();
for(i = 0; i < 100000; i++)
Munge8(block + alignment, size);
end_t = clock();
printf("%i\n", end_t - start_t);
}
// Repeat, but next time with Munge16, Munge32, Munge64
}
I don't know if my CPU & RAM are so blazingly fast, but the output of all 4 functions (Munge8, Munge16, Munge32 and Munge64) is always 3 or 4 (random, no pattern).
Is this possible? 100000 repetitions should be alot more work to do, or am I that wrong? I'm working on a Windows 7 Enterprise x64, Intel Core i7-4600U CPU # 2.10GHz. All compiler optimizations are turned off i.e. /Od.
All the related questions on SO didn't answer why my solution isn't working.
What am I doing wrong? Any help is greatly appreciated.
Edit:
First of all: Thank you very much for your help. After changing the type of size from uint8_t to uint32_t I altered all the inside loops causing undefined behaviour of the test functions to two separate lines:
while( data32 != data32End )
{
data32++;
*data32 = -(*data32);
}
Now I'm getting a relatively stable output of 25/26, 12/13, 6 and 3 ticks, calculating the average of 100 repetitions. Is this a logical result? Does this mean that my architecture handles unaligned access as fast (or as slow) as aligned access? Do I measure the time to inexactly? Or is there a problem with accuracy when dividing by 10? My new code:
int main(void)
{
int i = 0;
uint8_t alignment = 0;
uint64_t size = 1024 * 1024 * 10; // 10MiB
uint8_t* block = malloc(size);
printf("%i\n\n", CLOCKS_PER_SEC); // yields 1000, just for comparison how fast my machine 'ticks'
for(alignment = 0; alignment <= 17; alignment++)
{
start_t = clock();
for(i = 0; i < 100; i++)
singleByte(block + alignment, size);
end_t = clock();
printf("%i\n", (end_t - start_t)/100);
}
// Again, repeat with all different functions
}
General criticism is, of course, also appreciated. :)

This fails due to integer overflow:
uint8_t size = 1024 * 1024 * 10; // 10MiB
it should be:
const size_t size = 1024 * 1024 * 10; // 10MiB
No idea why you'd ever use an 8-bit quantity to hold something that large.
Investigate how to enable all warnings for your compiler.

It seems there is a problem with your clock function. 1000 for CLOCKS_PER_SEC is way too low for your processor even if CPU throttling is activated (you should get around 2100000 if frequency scaling is turned off). How much cycles do you get for each averaged mesure by using cycle.h ?

Run code for exactly one second

I would like to know how I can program something so that my program runs as long as a second lasts.
I would like to evaluate parts of my code and see where the time is spend most so I am analyzing parts of it.
Here's the interesting part of my code :
int size = 256
clock_t start_benching = clock();
for (uint32_t i = 0;i < size; i+=4)
{
myarray[i];
myarray[i+1];
myarray[i+2];
myarray[i+3];
}
clock_t stop_benching = clock();
This just gives me how long the function needed to perform all the operations.
I want to run the code for one second and see how many operations have been done.
This is the line to print the time measurement:
printf("Walking through buffer took %f seconds\n", (double)(stop_benching - start_benching) / CLOCKS_PER_SEC);

A better approach to benchmarking is to know the % of time spent on each section of the code.
Instead of making your code run for exactly 1 second, make stop_benchmarking - start_benchmarking the total run time - Take the time spent on any part of the code and divide by the total runtime to get a value between 0 and 1. Multiply this value by 100 and you have the % of time consumed at that specific section.

Non-answer advice: Use an actual profiler to profile the performance of code sections.
On *nix you can set an alarm(2) with a signal handler that sets a global flag to indicate the elapsed time. The Windows API provides something similar with SetTimer.
#include <unistd.h>
#include <signal.h>
int time_elapsed = 0;
void alarm_handler(int signal) {
time_elapsed = 1;
}
int main() {
signal(SIGALRM, &alarm_handler);
alarm(1); // set alarm time-out to 1 second
do {
// stuff...
} while (!time_elapsed);
return 0;
}
In more complicated cases you can use setitimer(2) instead of alarm(2), which lets you
use microsecond precision and
choose between counting
wall clock time,
user CPU time, or
user and system CPU time.

Segmentation fault when calling clock()

I am trying to understand the effects of caching programmatically using the following program. I am getting segfault with the code. I used GDB (compiled with -g -O0) and found that it was segmentation faulting on
start = clock() (first occourance)
Am I doing something wrong? The code looks fine to me. Can someone point out the mistake?
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>
#define MAX_SIZE (16*1024*1024)
int main()
{
clock_t start, end;
double cpu_time;
int i = 0;
int arr[MAX_SIZE];
/* CPU clock ticks count start */
start = clock();
/* Loop 1 */
for (i = 0; i < MAX_SIZE; i++)
arr[i] *= 3;
/* CPU clock ticks count stop */
end = clock();
cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("CPU time for loop 1 %.6f secs.\n", cpu_time);
/* CPU clock ticks count start */
start = clock();
/* Loop 2 */
for (i = 0; i < MAX_SIZE; i += 16)
arr[i] *= 3;
/* CPU clock ticks count stop */
end = clock();
cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("CPU time for loop 2 %.6f secs.\n", cpu_time);
return 0;
}

The array might be too big for the stack. Try making it static instead, so it goes into the global variable space. As an added bonus, static variables are initialized to all zero.
Unlike other kinds of storage, the compiler can check that resources exist for globals at compile time (and the OS can double check at runtime before the program starts) so you don't need to handle out of memory errors. An uninitialized array won't make your executable file bigger.
This is an unfortunate rough edge of the way the stack works. It lives in a fixed-size buffer, set by the program executable's configuration according to the operating system, but its actual size is seldom checked against the available space.
Welcome to Stack Overflow land!

Try to change:
int arr[MAX_SIZE];
to:
int *arr = (int*)malloc(MAX_SIZE * sizeof(int));
As Potatoswatter suggested The array might be too big for the stack... You might allocate on the heap, than on the stack...
More informations.

Get average run-time of a C program

I'm trying to measure differences in speed of reading and writing misaligned vs aligned bits into binary files. I would like to know is there an utility I can use (Except for running time over & over again and writing my own) to sample an average run-time of a program (I'm running Linux based OS)?
Thanks

running time over & over again and writing my own
That's fine. You can perform the read/write ten thousand times both ways and compute the average time.
If you really want to use a library you can try Google Perftools.

Put this in a header file:
#ifndef TIMER_H
#define TIMER_H
#include <stdlib>
#include <sys/time.h>
typedef unsigned long long timestamp_t;
static timestamp_t
get_timestamp ()
{
struct timeval now;
gettimeofday (&now, NULL);
return now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
}
#endif
Include the header file into whichever .c file you'll be using, and do something like this:
#define N 10000
int main()
{
int i;
double avg;
timestamp_t start, end;
start = get_timestamp();
for(i = 0; i < N; i++)
foo();
end = get_timestamp();
avg = (end - start) / (double)N;
printf("%f", avg);
return 0;
}
Basically this calls whichever function you're trying to measure performance of N times, where N is a defined constant (doesn't have to be) in this case. It takes a timestamp before the for loop and after the for loop and then calculates the average time it's taken for the function to execute. The get_timestamp() function returns the number of microseconds, so if you need milliseconds, divide by 1000, seconds - divide by 1000000 etc.