I'm trying to measure differences in speed of reading and writing misaligned vs aligned bits into binary files. I would like to know is there an utility I can use (Except for running time over & over again and writing my own) to sample an average run-time of a program (I'm running Linux based OS)?
Thanks
running time over & over again and writing my own
That's fine. You can perform the read/write ten thousand times both ways and compute the average time.
If you really want to use a library you can try Google Perftools.
Put this in a header file:
#ifndef TIMER_H
#define TIMER_H
#include <stdlib>
#include <sys/time.h>
typedef unsigned long long timestamp_t;
static timestamp_t
get_timestamp ()
{
struct timeval now;
gettimeofday (&now, NULL);
return now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
}
#endif
Include the header file into whichever .c file you'll be using, and do something like this:
#define N 10000
int main()
{
int i;
double avg;
timestamp_t start, end;
start = get_timestamp();
for(i = 0; i < N; i++)
foo();
end = get_timestamp();
avg = (end - start) / (double)N;
printf("%f", avg);
return 0;
}
Basically this calls whichever function you're trying to measure performance of N times, where N is a defined constant (doesn't have to be) in this case. It takes a timestamp before the for loop and after the for loop and then calculates the average time it's taken for the function to execute. The get_timestamp() function returns the number of microseconds, so if you need milliseconds, divide by 1000, seconds - divide by 1000000 etc.
Related
I'm doing some exercise projects in a C book, and I was asked to write a program that uses clock function in C library to measure how long it takes qsort function to sort an array that's reversed from a sorted state. So I wrote below:
/*
* Write a program that uses the clock function to measure how long it takes qsort to sort
* an array of 1000 integers that are originally in reverse order. Run the program for arrays
* of 10000 and 100000 integers as well.
*/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define SIZE 10000
int compf(const void *, const void *);
int main(void)
{
int arr[SIZE];
clock_t start_clock, end_clock;
for (int i = 0; i < SIZE; ++i) {
arr[i] = SIZE - i;
}
start_clock = clock();
qsort(arr, SIZE, sizeof(arr[0]), compf);
end_clock = clock();
printf("start_clock: %ld\nend_clock: %ld\n", start_clock, end_clock);
printf("Measured seconds used: %g\n", (end_clock - start_clock) / (double)CLOCKS_PER_SEC);
return EXIT_SUCCESS;
}
int compf(const void *p, const void *q)
{
return *(int *)p - *(int *)q;
}
But running the program gives me the results below:
start_clock: 0
end_clock: 0
Measured clock used: 0
How can it be my system used 0 clock to sort an array? What am I doing wrong?
I'm using GCC included in mingw-w64 which is x86_64-8.1.0-release-win32-seh-rt_v6-rev0.
Also I'm compiling with arguments -g -Wall -Wextra -pedantic -std=c99 -D__USE_MINGW_ANSI_STDIO given to gcc.exe.
3 possible answers to your issue:
what is clock_t? Is it just a normal data type like int? Or is it some sort of struct? Make sure you are using it correctly for its data type
What is this running on? If your clock isn't already running you need to start it on, for instance, most microcontrollers. If you try pulling from it without starting, it will just be 0 at all times since the clock is not moving
Is your code fast enough that it's not registering? Is it actually taking 0 seconds (rounded down) to run? 1 full second is a very long time in the coding world, you can run millions of lines of code in less than a second. Make sure your timing process can handle small numbers (ie. you can register 1 micro-second of timing), or your code is running slow enough to register on your clock speed
I have searched and used many approaches for measuring the elapsed time. there are many questions for this purpose. For example, this question is very good but when you need an accurate time recorder I couldn't find a good method. For this, I want to share my method here to be used and be corrected if something is wrong.
UPDATE&NOTE: this question is for Benchmarking, less than one nanosecond. It's completely different from using clock_gettime(CLOCK_MONOTONIC,&start); it records time more than one nanosecond.
UPDATE : A common method to measure the speedup is repeating a section of the program which should be benchmarked. But, as mentioned in comment it might show different optimization when the researcher rely on autovectorizing.
NOTE It's not accurate enough to measure the elapsed time in one repeatinng. In some cases my results show that the section must be repeated more than 1K or 1M to get the smallest time.
SUGGESTION : I'm not familiar with shell programming (just know some basic commands...) But, it might be possible to measure the smallest time with out repeating inside the program.
MY CURRENT SOLUTION In order to prevent the branches I repeat the ode section using a macro #define REP_CODE(X) X X X... X X which X is the code section I want to benchmark as follows:
//numbers
#define FMAX1 MAX1*MAX1
#define COEFF 8
int __attribute__(( aligned(32))) input[FMAX1+COEFF]; //= {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17};
int __attribute__(( aligned(32))) output[FMAX1];
int __attribute__(( aligned(32))) coeff[COEFF] = {1,2,3,4,5,6,7,8};//= {1,1,1,1,1,1,1,1};//; //= {1,2,1,2,1,2,1,2,2,1};
int main()
{
REP_CODE(
t1_rdtsc=_rdtsc();
//Code
for(i = 0; i < FMAX1; i++){
for(j = 0; j < COEFF; j++){//IACA_START
output[i] += coeff[j] * input[i+j];
}//IACA_END
}
t2_rdtsc=_rdtsc();
ttotal_rdtsc[ii++]=t2_rdtsc-t1_rdtsc;
)
// The smallest element in `ttotal_rdtsc` is the answer
}
This does not impact the optimization but also is restricted by code size and compiling time is too much in some cases.
Any suggestion and correction?
Thanks in advance.
If you have problem with autovectorizer and want to limit it just add a asm("#somthing"); after your begin_rdtsc it will separate the do-while loop. I just checked and it vectorized your posted code which auto vectorizer was unable to vectorize it.
I changed your macro you can use it....
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc[do_while], ttbest_rdtsc = 99999999999999999, elapsed, elapsed_rdtsc=do_while, overal_time = OVERAL_TIME, ttime=0;
int ii=0;
#define begin_rdtsc\
do{\
asm("#mmmmmmmmmmm");\
t1_rdtsc=_rdtsc();
#define end_rdtsc\
t2_rdtsc=_rdtsc();\
asm("#mmmmmmmmmmm");\
ttotal_rdtsc[ii]=t2_rdtsc-t1_rdtsc;\
}while (ii++<do_while);\
for(ii=0; ii<do_while; ii++){\
if (ttotal_rdtsc[ii]<ttbest_rdtsc){\
ttbest_rdtsc = ttotal_rdtsc[ii];}}\
printf("\nthe best is %lld in %lld iteration\n", ttbest_rdtsc, elapsed_rdtsc);
I have developed my first answer and got this solution. But, I still want a solution. Because it is very important to measure the time accurately and with the least impacts. I put this part in a header file and include it in main program files.
//Header file header.h
#define count 1000 // number of repetition
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc[count], ttbest_rdtsc = 99999999999999999, elapsed, elapsed_rdtsc=count, overal_time = OVERAL_TIME, ttime=0;
int ii=0;
#define begin_rdtsc\
do{\
t1_rdtsc=_rdtsc();
#define end_rdtsc\
t2_rdtsc=_rdtsc();\
ttotal_rdtsc[ii]=t2_rdtsc-t1_rdtsc;\
}while (ii++<count);\
for(ii=0; ii<do_while; ii++){\
if (ttotal_rdtsc[ii]<ttbest_rdtsc){\
ttbest_rdtsc = ttotal_rdtsc[ii];}}\
printf("\nthe best is %lld in %lldth iteration \n", ttbest_rdtsc, elapsed_rdtsc);
//Main program
#include "header.h"
.
.
.
int main()
{
//before the section
begin_rdtsc
//put your code here to measure the clocks.
end_rdtsc
return 0
}
I recommend using this method for x86 micro-architecture.
NOTE:
NUM_LOOP should be a number which helps to increase the accuracy
with repeating your code to record the best time
ttbest_rdtsc must
be bigger than the worst time I recommend to maximize it.
I used (you might not want it) OVERAL_TIME as another checking rule because I used this for many kernels and in some cases NUM_LOOP was very big and I didn't want to change it. I planned OVERAL_TIME to limit the iterations and stop after specific time.
UPDATE: The whole program is this:
#include <stdio.h>
#include <x86intrin.h>
#define NUM_LOOP 100 //executes your code NUM_LOOP times to get the smalest time to avoid overheads such as cache misses, etc.
int main()
{
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc, ttbest_rdtsc = 99999999999999999;
int do_while = 0;
do{
t1_rdtsc = _rdtsc();
//put your code here
t2_rdtsc = _rdtsc();
ttotal_rdtsc = t2_rdtsc - t1_rdtsc;
//store the smalest time:
if (ttotal_rdtsc<ttbest_rdtsc)
ttbest_rdtsc = ttotal_rdtsc;
}while (do_while++ < NUM_LOOP);
printf("\nthe best is %lld in %d repetitions\n", ttbest_rdtsc, NUM_LOOP );
return 0;
}
that I have changed to this and added to a header for my self then I can use it simply in my program.
#include <x86intrin.h>
#define do_while NUM_LOOP
#define OVERAL_TIME 999999999
long long t1_rdtsc, t2_rdtsc, ttotal_rdtsc, ttbest_rdtsc = 99999999999999999, elapsed, elapsed_rdtsc=do_while, overal_time = OVERAL_TIME, ttime=0;
#define begin_rdtsc\
do{\
t1_rdtsc=_rdtsc();
#define end_rdtsc\
t2_rdtsc=_rdtsc();\
ttotal_rdtsc=t2_rdtsc-t1_rdtsc;\
if (ttotal_rdtsc<ttbest_rdtsc){\
ttbest_rdtsc = ttotal_rdtsc;\
elapsed=(do_while-elapsed_rdtsc);}\
ttime+=ttotal_rdtsc;\
}while (elapsed_rdtsc-- && (ttime<overal_time));\
printf("\nthe best is %lld in %lldth iteration and %lld repetitions\n", ttbest_rdtsc, elapsed, (do_while-elapsed_rdtsc));
How to use this method? Well, it is very simple!
int main()
{
//before the section
begin_rdtsc
//put your code here to measure the clocks.
end_rdtsc
return 0
}
Be creative, You can change it to measure the speedup in your program, etc.
An example of the output is:
the best is 9600 in 384751th iteration and 569179 repetitions
my tested code got 9600 clock that the best was recorded in 384751enditeration and my code was tested 569179 times
I have tested them on GCC and Clang.
My program is going to race different sorting algorithms against each other, both in time and space. I've got space covered, but measuring time is giving me some trouble. Here is the code that runs the sorts:
void test(short* n, short len) {
short i, j, a[1024];
for(i=0; i<2; i++) { // Loop over each sort algo
memused = 0; // Initialize memory marker
for(j=0; j<len; j++) // Copy scrambled list into fresh array
a[j] = n[j]; // (Sorting algos are in-place)
// ***Point A***
switch(i) { // Pick sorting algo
case 0:
selectionSort(a, len);
case 1:
quicksort(a, len);
}
// ***Point B***
spc[i][len] = memused; // Record how much mem was used
}
}
(I removed some of the sorting algos for simplicity)
Now, I need to measure how much time the sorting algo takes. The most obvious way to do this is to record the time at point (a) and then subtract that from the time at point (b). But none of the C time functions are good enough:
time() gives me time in seconds, but the algos are faster than that, so I need something more accurate.
clock() gives me CPU ticks since the program started, but seems to round to the nearest 10,000; still not small enough
The time shell command works well enough, except that I need to run over 1,000 tests per algorithm, and I need the individual time for each one.
I have no idea what getrusage() returns, but it's also too long.
What I need is time in units (significantly, if possible) smaller than the run time of the sorting functions: about 2ms. So my question is: Where can I get that?
gettimeofday() has microseconds resolution and is easy to use.
A pair of useful timer functions is:
static struct timeval tm1;
static inline void start()
{
gettimeofday(&tm1, NULL);
}
static inline void stop()
{
struct timeval tm2;
gettimeofday(&tm2, NULL);
unsigned long long t = 1000 * (tm2.tv_sec - tm1.tv_sec) + (tm2.tv_usec - tm1.tv_usec) / 1000;
printf("%llu ms\n", t);
}
For measuring time, use clock_gettime with CLOCK_MONOTONIC (or CLOCK_MONOTONIC_RAW if it is available). Where possible, avoid using gettimeofday. It is specifically deprecated in favor of clock_gettime, and the time returned from it is subject to adjustments from time servers, which can throw off your measurements.
You can get the total user + kernel time (or choose just one) using getrusage as follows:
#include <sys/time.h>
#include <sys/resource.h>
double get_process_time() {
struct rusage usage;
if( 0 == getrusage(RUSAGE_SELF, &usage) ) {
return (double)(usage.ru_utime.tv_sec + usage.ru_stime.tv_sec) +
(double)(usage.ru_utime.tv_usec + usage.ru_stime.tv_usec) / 1.0e6;
}
return 0;
}
I elected to create a double containing fractional seconds...
double t_begin, t_end;
t_begin = get_process_time();
// Do some operation...
t_end = get_process_time();
printf( "Elapsed time: %.6f seconds\n", t_end - t_begin );
The Time Stamp Counter could be helpful here:
static unsigned long long rdtsctime() {
unsigned int eax, edx;
unsigned long long val;
__asm__ __volatile__("rdtsc":"=a"(eax), "=d"(edx));
val = edx;
val = val << 32;
val += eax;
return val;
}
Though there are some caveats to this. The timestamps for different processor cores may be different, and changing clock speeds (due to power saving features and the like) can cause erroneous results.
I want to measure the time between the start to the end of the function in a loop. This difference will be used to set the amount of loops of the inner while-loops which does some here not important stuff.
I want to time the function like this :
#include <wiringPi.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#define BILLION 1E9
float hz = 1000;
long int nsPerTick = BILLION/hz;
double unprocessed = 1;
struct timespec now;
struct timespec last;
clock_gettime(CLOCK_REALTIME, &last);
[ ... ]
while (1)
{
clock_gettime(CLOCK_REALTIME, &now);
double diff = (last.tv_nsec - now.tv_nsec );
unprocessed = unprocessed + (diff/ nsPerTick);
clock_gettime(CLOCK_REALTIME, &last);
while (unprocessed >= 1) {
unprocessed --;
DO SOME RANDOM MAGIC;
}
}
The difference between the timer is always negative. I was told this was where the error was:
if ( (last.tv_nsec - now.tv_nsec)<0) {
double diff = 1000000000+ last.tv_nsec - now.tv_nsec;
}
else {
double diff = (last.tv_nsec - now.tv_nsec );
}
But still, my variable difference and is always negative like "-1095043244" (but the time spent during the function is a positive of course).
What's wrong?
Your first issue is that you have `last.tv_nsec - now.tv_nsec, which is the wrong way round.
last.tv_nsec is in the past (let's say it's set to 1), and now.tv_nsec will always be later (for example, 8ns later, so it's 9). In that case, last.tv_nsec - now.tv_nsec == 1 - 9 == -8.
The other issue is that tv_nsec isn't the time in nanoseconds: for that, you'd need to multiply the time in seconds by a billion and add that. So to get the difference in ns between now and last, you want:
((now.tv_sec - last.tv_sec) * ONE_BILLION) + (now.tv_nsec - last.tv_nsec)
(N.B. I'm still a little surprised that although now.tv_nsec and last.tv_nsec are both less than a billion, subtracting one from the other gives a value less than -1000000000, so there may yet be something I'm missing here.)
I was just investigating timing on Pi, with similar approach and similar problems. My thoughts are:
You don't have to use double. In fact you also don't need nano-seconds, as the clock on Pi has 1 microsecond accuracy anyway (it's the way the Broadcom did it). I suggest you to use gettimeofday() to get microsecs instead of nanosecs. Then computation is easy, it's just:
number of seconds + (1000 * 1000 * number of micros)
which you can simply calculate as unsigned int.
I've implemented the convenient API for this:
typedef struct
{
struct timeval startTimeVal;
} TIMER_usecCtx_t;
void TIMER_usecStart(TIMER_usecCtx_t* ctx)
{
gettimeofday(&ctx->startTimeVal, NULL);
}
unsigned int TIMER_usecElapsedUs(TIMER_usecCtx_t* ctx)
{
unsigned int rv;
/* get current time */
struct timeval nowTimeVal;
gettimeofday(&nowTimeVal, NULL);
/* compute diff */
rv = 1000000 * (nowTimeVal.tv_sec - ctx->startTimeVal.tv_sec) + nowTimeVal.tv_usec - ctx->startTimeVal.tv_usec;
return rv;
}
And the usage is:
TIMER_usecCtx_t timer;
TIMER_usecStart(&timer);
while (1)
{
if (TIMER_usecElapsedUs(timer) > yourDelayInMicroseconds)
{
doSomethingHere();
TIMER_usecStart(&timer);
}
}
Also notice the gettime() calls on Pi take almost 1 [us] to complete. So, if you need to call gettime() a lot and need more accuracy, go for some more advanced methods of getting time... I've explained more about it in this short article about Pi get-time calls
Well, I don't know C, but if it's a timing issue on a Raspberry Pi it might have something to do with the lack of an RTC (real time clock) on the chip.
You should not be storing last.tv_nsec - now.tv_nsec in a double.
If you look at the documentation of time.h, you can see that tv_nsec is stored as a long. So you will need something along the lines of:
long diff = end.tv_nsec - begin.tv_nsec
With that being said, only comparing the nanoseconds can go wrong. You also need to look at the number of seconds also. So to convert everything to seconds, you can use this:
long nanosec_diff = end.tv_nsec - begin.tv_nsec;
time_t sec_diff = end.tv_sec - begin.tv_sec; // need <sys/types.h> for time_t
double diff_in_seconds = sec_diff + nanosec_diff / 1000000000.0
Also, make sure you are always subtracting the end time from the start time (or else your time will still be negative).
And there you go!
Using the following code:
#include<stdio.h>
#include<time.h>
int main()
{
clock_t start, stop;
int i;
start = clock();
for(i=0; i<2000;i++)
{
printf("%d", (i*1)+(1^4));
}
printf("\n\n");
stop = clock();
//(double)(stop - start) / CLOCKS_PER_SEC
printf("%6.3f", start);
printf("\n\n%6.3f", stop);
return 0;
}
I get the following output:

2.169
2.169
Start and stop times are the same. Does it mean that the program hardly takes time to complete execution?
If 1. is false, then atleast the no.of digits beyond the (.) should differ, which does not happen here. Is my logic correct?
Note: I need to calculate the time taken for execution, and hence the above code.
Yes, this program has likely used less than a millsecond. Try using microsecond resolution with timeval.
e.g:
#include <sys/time.h>
struct timeval stop, start;
gettimeofday(&start, NULL);
//do stuff
gettimeofday(&stop, NULL);
printf("took %lu us\n", (stop.tv_sec - start.tv_sec) * 1000000 + stop.tv_usec - start.tv_usec);
You can then query the difference (in microseconds) between stop.tv_usec - start.tv_usec. Note that this will only work for subsecond times (as tv_usec will loop). For the general case use a combination of tv_sec and tv_usec.
Edit 2016-08-19
A more appropriate approach on system with clock_gettime support would be:
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC_RAW, &start);
//do stuff
clock_gettime(CLOCK_MONOTONIC_RAW, &end);
uint64_t delta_us = (end.tv_sec - start.tv_sec) * 1000000 + (end.tv_nsec - start.tv_nsec) / 1000;
Here is what I write to get the timestamp in millionseconds.
#include<sys/time.h>
long long timeInMilliseconds(void) {
struct timeval tv;
gettimeofday(&tv,NULL);
return (((long long)tv.tv_sec)*1000)+(tv.tv_usec/1000);
}
A couple of things might affect the results you're seeing:
You're treating clock_t as a floating-point type, I don't think it is.
You might be expecting (1^4) to do something else than compute the bitwise XOR of 1 and 4., i.e. it's 5.
Since the XOR is of constants, it's probably folded by the compiler, meaning it doesn't add a lot of work at runtime.
Since the output is buffered (it's just formatting the string and writing it to memory), it completes very quickly indeed.
You're not specifying how fast your machine is, but it's not unreasonable for this to run very quickly on modern hardware, no.
If you have it, try adding a call to sleep() between the start/stop snapshots. Note that sleep() is POSIX though, not standard C.
This code snippet can be used for displaying time in seconds,milliseconds and microseconds:
#include <sys/time.h>
struct timeval start, stop;
double secs = 0;
gettimeofday(&start, NULL);
// Do stuff here
gettimeofday(&stop, NULL);
secs = (double)(stop.tv_usec - start.tv_usec) / 1000000 + (double)(stop.tv_sec - start.tv_sec);
printf("time taken %f\n",secs);
You can use gettimeofday() together with the timedifference_msec() function below to calculate the number of milliseconds elapsed between two samples:
#include <sys/time.h>
#include <stdio.h>
float timedifference_msec(struct timeval t0, struct timeval t1)
{
return (t1.tv_sec - t0.tv_sec) * 1000.0f + (t1.tv_usec - t0.tv_usec) / 1000.0f;
}
int main(void)
{
struct timeval t0;
struct timeval t1;
float elapsed;
gettimeofday(&t0, 0);
/* ... YOUR CODE HERE ... */
gettimeofday(&t1, 0);
elapsed = timedifference_msec(t0, t1);
printf("Code executed in %f milliseconds.\n", elapsed);
return 0;
}
Note that, when using gettimeofday(), you need to take seconds into account even if you only care about microsecond differences because tv_usec will wrap back to zero every second and you have no way of knowing beforehand at which point within a second each sample is obtained.
From man clock:
The clock() function returns an approximation of processor time used by the program.
So there is no indication you should treat it as milliseconds. Some standards require precise value of CLOCKS_PER_SEC, so you could rely on it, but I don't think it is advisable.
Second thing is that, as #unwind stated, it is not float/double. Man times suggests that will be an int.
Also note that:
this function will return the same value approximately every 72 minutes
And if you are unlucky you might hit the moment it is just about to start counting from zero, thus getting negative or huge value (depending on whether you store the result as signed or unsigned value).
This:
printf("\n\n%6.3f", stop);
Will most probably print garbage as treating any int as float is really not defined behaviour (and I think this is where most of your problem comes). If you want to make sure you can always do:
printf("\n\n%6.3f", (double) stop);
Though I would rather go for printing it as long long int at first:
printf("\n\n%lldf", (long long int) stop);
The standard C library provides timespec_get. It can tell time up to nanosecond precision, if the system supports. Calling it, however, takes a bit more effort because it involves a struct. Here's a function that just converts the struct to a simple 64-bit integer so you can get time in milliseconds.
#include <stdio.h>
#include <inttypes.h>
#include <time.h>
int64_t millis()
{
struct timespec now;
timespec_get(&now, TIME_UTC);
return ((int64_t) now.tv_sec) * 1000 + ((int64_t) now.tv_nsec) / 1000000;
}
int main(void)
{
printf("Unix timestamp with millisecond precision: %" PRId64 "\n", millis());
}
Unlike clock, this function returns a Unix timestamp so it will correctly account for the time spent in blocking functions, such as sleep.
Modern processors are too fast to register the running time. Hence it may return zero. In this case, the time you started and ended is too small and therefore both the times are the same after round of.