Clock Cycles Count Variation Cortex A53 AArch64 - c

I try to count cpu clock cycles for my function on ARM Cortex-A53 using following function:
#include <sys/time.h>
readticks(unsigned int *result, int enabled)
{
struct timeval t;
unsigned int cc;
unsigned int val;
if (!enabled) {
// program the performance-counter control-register:
asm volatile("msr pmcr_el0, %0" : : "r" (17));
//enable all counters
asm volatile("msr PMCNTENSET_EL0, %0" : : "r" (0x8000000f));
//clear the overflow
asm volatile("msr PMOVSCLR_EL0, %0" : : "r" (0x8000000f));
enabled = 1;
}
//read the coutner value
asm volatile("mrs %0, PMCCNTR_EL0" : "=r" (cc));
gettimeofday(&t,(struct timezone *) 0);
result[0] = cc;
result[1] = t.tv_usec;
result[2] = t.tv_sec;
}
and here is my user space application:
#include <stio.h>
#include <inttypes.h>
#include <time.h>
int main(){
unsigned int init[3] = {0};
unsigned int start[3] = {0};
unsigned int end[3] = {0};
unsigned int overhead = 0;
readticks(init, 0);
readticks(start, 1);
readticks(end, 1);
overhead = end[0] - start[0];
readticks(init, 0);
readticks(start, 1);
foo(); //This is my function
readticks(end, 1);
end[0] = end[0] - start[0] - overhead;
printf("clock cycles= %d\n", end[0]);
return 0;
}
When I run my code for several times, I have got different clock cycles with relatively high variation (almost 5000). My code should be run around 4000 clock cycles, but I have got 4500 - 9500 clock cycles. Is there any way around that gives me more accurate clock cycles count?

use the following macro
#define mfcp(rn) ({u32 rval = 0U; \
__asm__ __volatile__(\
"mrc " rn "\n"\
: "=r" (rval)\
);\
rval;\
})
#endif
call mfcp with counter register
uint64_t t1,t2;
t1 = mfcp(CNTPCT_EL0);
// your code
t2 = mfcp(CNTPCT_EL0);

Related

Inline assembly in C with a jump and two return statements

I would like to have a void function which prints whether carry (overflow) happened and print the value of the (possibly overflowed) summand.
This is my try, but it does want to compile:
#include <stdio.h>
typedef unsigned long ulong;
void
func()
{
ulong a = 3;
ulong b = 1;
ulong c;
__asm__(
"\taddq\t%2, %q0\n" /* Add b to a and store in c */
"\tjc\tcarry"
: "=r" (c) /* Outputs */
: "0" (a), "rme" (b) /* Inputs */
);
printf("no carry\nc = %lu\n", c);
return;
__asm__("carry:");
printf("carry\nc = %lu\n", c);
return;
}
int
main()
{
func();
}
However, it runs when I remove the first return statement, but then it prints twice if no carry happened.
How can I do this with two return statements?
You need to use asm goto for that. To do that, add goto after __asm__, change your label to be a C label, pass the label after the clobbers, and then use a %l to refer to it. Here's your program with those fixes applied:
#include <stdio.h>
typedef unsigned long ulong;
void
func()
{
ulong a = 3;
ulong b = 1;
ulong c;
__asm__ goto(
"\taddq\t%2, %q0\n" /* Add b to a and store in c */
"\tjc\t%l[carry]"
: "=r" (c) /* Outputs */
: "0" (a), "rme" (b) /* Inputs */
:
: carry
);
printf("no carry\nc = %lu\n", c);
return;
carry:
printf("carry\nc = %lu\n", c);
return;
}
int
main()
{
func();
}
Though as was mentioned in the comments, for this use case in particular, you should just use __builtin_uaddl_overflow instead, unless your goal is just to learn how to jump out of inline assembly.

achieve GCC cas function for version 4.1.2 and earlier

My new company project, they want the code run for the 32-bit, the compile server is a CentOS 5.0 with GCC 4.1.1, that was the nightmare.
There are lots of functions using in the project like __sync_fetch_and_add was given in GCC 4.1.2 and later.
I was told can not upgrade GCC version, so I have to make another solution after Googling for several hours.
When I wrote a demo to test, I just got the wrong answer, the code blow want to replace function __sync_fetch_and_add
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static int count = 0;
int compare_and_swap(int* reg, int oldval, int newval)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1"
: "=m"(*reg), "=q" (result)
: "m" (*reg), "r" (newval), "a" (oldval)
: "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1"
: "=m"(*reg), "=q" (result)
: "m" (*reg), "r" (newval), "a" (oldval)
: "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void *test_func(void *arg)
{
int i = 0;
for(i = 0; i < 2000; ++i) {
compare_and_swap((int *)&count, count, count + 1);
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i = 0; i < 10; ++i){
pthread_create(&id[i], NULL, test_func, NULL);
}
for(i = 0; i < 10; ++i) {
pthread_join(id[i], NULL);
}
//10*2000=20000
printf("%d\n", count);
return 0;
}
Whent I got the wrong result:
[root#centos-linux-7 workspace]# ./asm
17123
[root#centos-linux-7 workspace]# ./asm
14670
[root#centos-linux-7 workspace]# ./asm
14604
[root#centos-linux-7 workspace]# ./asm
13837
[root#centos-linux-7 workspace]# ./asm
14043
[root#centos-linux-7 workspace]# ./asm
16160
[root#centos-linux-7 workspace]# ./asm
15271
[root#centos-linux-7 workspace]# ./asm
15280
[root#centos-linux-7 workspace]# ./asm
15465
[root#centos-linux-7 workspace]# ./asm
16673
I realize in this line
compare_and_swap((int *)&count, count, count + 1);
count + 1 was wrong!
Then how can I implement the same function as __sync_fetch_and_add. The compare_and_swap function works when the third parameter is constant.
By the way, compare_and_swap function is that right? I just Googled for that, not familiar with assembly.
I got despair with this question.
………………………………………………………………………………………………………………………………………………………………………………………………………………………
after seeing the answer below,I use while and got the right answer,but seems confuse more.
here is the code:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static unsigned long count = 0;
int sync_add_and_fetch(int* reg, int oldval, int incre)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (oldval + incre), "a" (oldval) : "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (newval + incre), "a" (oldval) : "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void *test_func(void *arg)
{
int i=0;
int result = 0;
for(i=0;i<2000;++i)
{
result = 0;
while(0 == result)
{
result = sync_add_and_fetch((int *)&count, count, 1);
}
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i=0;i<10;++i){
pthread_create(&id[i],NULL,test_func,NULL);
}
for(i=0;i<10;++i){
pthread_join(id[i],NULL);
}
//10*2000=20000
printf("%u\n",count);
return 0;
}
the answer goes right to 20000,so i think when you use sync_add_and_fetch function,you should goes with a while loop is stupid,so I write like this:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static unsigned long count = 0;
int compare_and_swap(int* reg, int oldval, int incre)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (oldval + incre), "a" (oldval) : "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (newval + incre), "a" (oldval) : "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void sync_add_and_fetch(int *reg,int oldval,int incre)
{
int ret = 0;
while(0 == ret)
{
ret = compare_and_swap(reg,oldval,incre);
}
}
void *test_func(void *arg)
{
int i=0;
for(i=0;i<2000;++i)
{
sync_add_and_fetch((int *)&count, count, 1);
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i=0;i<10;++i){
pthread_create(&id[i],NULL,test_func,NULL);
}
for(i=0;i<10;++i){
pthread_join(id[i],NULL);
}
//10*2000=20000
printf("%u\n",count);
return 0;
}
but when i run this code with ./asm after g++ -g -o asm asm.cpp -lpthread.the asm just stuck for more than 5min,see top in another terminal:
3861 root 19 0 102m 888 732 S 400 0.0 2:51.06 asm
I just confused,is this code not the same?
The 64-bit compare_and_swap is wrong as it swaps 64 bits but int is only 32 bits.
compare_and_swap should be used in a loop which retries it until is succeeds.
Your result look right to me. lock cmpxchg succeeds most of the time, but will fail if another core beat you to the punch. You're doing 20k attempts to cmpxchg count+1, not 20k atomic increments.
To write __sync_fetch_and_add with inline asm, you'll want to use lock xadd. It's specifically designed to implement fetch-add.
Implementing other operations, like fetch-or or fetch-and, require a CAS retry loop if you actually need the old value. So you could make a version of the function that doesn't return the old value, and is just a sync-and without the fetch, using lock and with a memory destination. (Compiler builtins can make this optimization based on whether the result is needed or not, but an inline asm implementation doesn't get a chance to choose asm based on that information.)
For efficiency, remember that and, or, add and many other instructions can use immediate operands, so a "re"(src) constraint would be appropriate (not "ri" for int64_t on x86-64, because that would allow immediates too large. https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html). But cmpxchg, xadd, and xchg can't use immediates, of course.
I'd suggest looking at compiler output for modern gcc (e.g. on http://godbolt.org/) for functions using the builtin, to see what compilers do.
But beware that inline asm can compile correctly given one set of surrounding code, but not the way you expect given different code. e.g. if the surrounding code copied a value after using CAS on it (probably unlikely), the compiler might decide to give the asm template two different memory operands for "=m"(*reg) and "m"(*reg), but your asm template assumes they will always be the same address.
IDK if gcc4.1 supports it, but "+m"(*reg) would declare a read/write memory operand. Otherwise perhaps you can use a matching constraint to say that the input is in the same location as an earlier operand, like "0"(*reg). But that might only work for registers, not memory, I didn't check.
"a" (oldval) is a bug: cmpxchg writes EAX on failure.
It's not ok to tell the compiler you leave a reg unmodified, and then write an asm template that does modify it. You will get unpredictable behaviour from stepping on the compiler's toes.
See c inline assembly getting "operand size mismatch" when using cmpxchg for a safe inline-asm wrapper for lock cmpxchg. It's written for gcc6 flag-output, so you'll have to back-port that and maybe a few other syntax details to crusty old gcc4.1.
That answer also addresses returning the old value so it doesn't have to be separately loaded.
(Using ancient gcc4.1 sounds like a bad idea to me, especially for writing multi-threaded code. So much room for error from porting working code with __sync builtins to hand-rolled asm. The risks of using a newer compiler, like stable gcc5.5 if not gcc7.4, are different but probably smaller.)
If you're going to rewrite code using __sync builtins, the sane thing would be to rewrite it using C11 stdatomic.h, or GNU C's more modern __atomic builtins that are intended to replace __sync.
The Linux kernel does successfully use inline asm for hand-rolled atomics, though, so it's certainly possible.
If you truly are in such a predicament, I would start with the following header file:
#ifndef SYNC_H
#define SYNC_H
#if defined(__x86_64__) || defined(__i386__)
static inline int sync_val_compare_and_swap_int(int *ptr, int oldval, int newval)
{
__asm__ __volatile__( "lock cmpxchgl %[newval], %[ptr]"
: "+a" (oldval), [ptr] "+m" (*ptr)
: [newval] "r" (newval)
: "memory" );
return oldval;
}
static inline int sync_fetch_and_add_int(int *ptr, int val)
{
__asm__ __volatile__( "lock xaddl %[val], %[ptr]"
: [val] "+r" (val), [ptr] "+m" (*ptr)
:
: "memory" );
return val;
}
static inline int sync_add_and_fetch_int(int *ptr, int val)
{
const int old = val;
__asm__ __volatile__( "lock xaddl %[val], %[ptr]"
: [val] "+r" (val), [ptr] "+m" (*ptr)
:
: "memory" );
return old + val;
}
static inline int sync_fetch_and_sub_int(int *ptr, int val) { return sync_fetch_and_add_int(ptr, -val); }
static inline int sync_sub_and_fetch_int(int *ptr, int val) { return sync_add_and_fetch_int(ptr, -val); }
/* Memory barrier */
static inline void sync_synchronize(void) { __asm__ __volatile__( "mfence" ::: "memory"); }
#else
#error Unsupported architecture.
#endif
#endif /* SYNC_H */
The same extended inline assembly works for both x86 and x86-64. Only the int type is implemented, and you do need to replace possible __sync_synchronize() calls with sync_synchronize(), and each __sync_...() call with sync_..._int().
To test, you can use e.g.
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
#include "sync.h"
#define THREADS 16
#define PERTHREAD 8000
void *test_func1(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
while (n-->0)
sync_add_and_fetch_int(sum, n + 1);
return NULL;
}
void *test_func2(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
while (n-->0)
sync_fetch_and_add_int(sum, n + 1);
return NULL;
}
void *test_func3(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
int oldval, curval, newval;
while (n-->0) {
curval = *sum;
do {
oldval = curval;
newval = curval + n + 1;
} while ((curval = sync_val_compare_and_swap_int(sum, oldval, newval)) != oldval);
}
return NULL;
}
static void *(*worker[3])(void *) = { test_func1, test_func2, test_func3 };
int main(void)
{
pthread_t thread[THREADS];
pthread_attr_t attrs;
int sum = 0;
int t, result;
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 65536);
for (t = 0; t < THREADS; t++) {
result = pthread_create(thread + t, &attrs, worker[t % 3], &sum);
if (result) {
fprintf(stderr, "Failed to create thread %d of %d: %s.\n", t+1, THREADS, strerror(errno));
exit(EXIT_FAILURE);
}
}
pthread_attr_destroy(&attrs);
for (t = 0; t < THREADS; t++)
pthread_join(thread[t], NULL);
t = THREADS * PERTHREAD * (PERTHREAD + 1) / 2;
if (sum == t)
printf("sum = %d (as expected)\n", sum);
else
printf("sum = %d (expected %d)\n", sum, t);
return EXIT_SUCCESS;
}
Unfortunately, I don't have an ancient version of GCC to test, so this has only been tested with GCC 5.4.0 and GCC-4.9.3 for x86 and x86-64 (using -O2) on Linux.
If you find any bugs or issues in the above, please let me know in a comment so I can verify and fix as needed.

Visual Studio missing header file "sys/time.h" [duplicate]

I would like to measure time in C, and I am having a tough time figuring it out, all I want is something like this:
start a timer
run a method
stop the timer
report the time taken (at least to micro accuracy)
Any help would be appreciated.
(I am compiling in windows using mingw)
High resolution timers that provide a resolution of 1 microsecond are system-specific, so you will have to use different methods to achieve this on different OS platforms. You may be interested in checking out the following article, which implements a cross-platform C++ timer class based on the functions described below:
[Song Ho Ahn - High Resolution Timer][1]
Windows
The Windows API provides extremely high resolution timer functions: QueryPerformanceCounter(), which returns the current elapsed ticks, and QueryPerformanceFrequency(), which returns the number of ticks per second.
Example:
#include <stdio.h>
#include <windows.h> // for Windows APIs
int main(void)
{
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
// start timer
QueryPerformanceCounter(&t1);
// do something
// ...
// stop timer
QueryPerformanceCounter(&t2);
// compute and print the elapsed time in millisec
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
printf("%f ms.\n", elapsedTime);
}
Linux, Unix, and Mac
For Unix or Linux based system, you can use gettimeofday(). This function is declared in "sys/time.h".
Example:
#include <stdio.h>
#include <sys/time.h> // for gettimeofday()
int main(void)
{
struct timeval t1, t2;
double elapsedTime;
// start timer
gettimeofday(&t1, NULL);
// do something
// ...
// stop timer
gettimeofday(&t2, NULL);
// compute and print the elapsed time in millisec
elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0; // sec to ms
elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0; // us to ms
printf("%f ms.\n", elapsedTime);
}
On Linux you can use clock_gettime():
clock_gettime(CLOCK_REALTIME, &start); // get initial time-stamp
// ... do stuff ... //
clock_gettime(CLOCK_REALTIME, &end); // get final time-stamp
double t_ns = (double)(end.tv_sec - start.tv_sec) * 1.0e9 +
(double)(end.tv_nsec - start.tv_nsec);
// subtract time-stamps and
// multiply to get elapsed
// time in ns
Here's a header file I wrote to do some simple performance profiling (using manual timers):
#ifndef __ZENTIMER_H__
#define __ZENTIMER_H__
#ifdef ENABLE_ZENTIMER
#include <stdio.h>
#ifdef WIN32
#include <windows.h>
#else
#include <sys/time.h>
#endif
#ifdef HAVE_STDINT_H
#include <stdint.h>
#elif HAVE_INTTYPES_H
#include <inttypes.h>
#else
typedef unsigned char uint8_t;
typedef unsigned long int uint32_t;
typedef unsigned long long uint64_t;
#endif
#ifdef __cplusplus
extern "C" {
#pragma }
#endif /* __cplusplus */
#define ZTIME_USEC_PER_SEC 1000000
/* ztime_t represents usec */
typedef uint64_t ztime_t;
#ifdef WIN32
static uint64_t ztimer_freq = 0;
#endif
static void
ztime (ztime_t *ztimep)
{
#ifdef WIN32
QueryPerformanceCounter ((LARGE_INTEGER *) ztimep);
#else
struct timeval tv;
gettimeofday (&tv, NULL);
*ztimep = ((uint64_t) tv.tv_sec * ZTIME_USEC_PER_SEC) + tv.tv_usec;
#endif
}
enum {
ZTIMER_INACTIVE = 0,
ZTIMER_ACTIVE = (1 << 0),
ZTIMER_PAUSED = (1 << 1),
};
typedef struct {
ztime_t start;
ztime_t stop;
int state;
} ztimer_t;
#define ZTIMER_INITIALIZER { 0, 0, 0 }
/* default timer */
static ztimer_t __ztimer = ZTIMER_INITIALIZER;
static void
ZenTimerStart (ztimer_t *ztimer)
{
ztimer = ztimer ? ztimer : &__ztimer;
ztimer->state = ZTIMER_ACTIVE;
ztime (&ztimer->start);
}
static void
ZenTimerStop (ztimer_t *ztimer)
{
ztimer = ztimer ? ztimer : &__ztimer;
ztime (&ztimer->stop);
ztimer->state = ZTIMER_INACTIVE;
}
static void
ZenTimerPause (ztimer_t *ztimer)
{
ztimer = ztimer ? ztimer : &__ztimer;
ztime (&ztimer->stop);
ztimer->state |= ZTIMER_PAUSED;
}
static void
ZenTimerResume (ztimer_t *ztimer)
{
ztime_t now, delta;
ztimer = ztimer ? ztimer : &__ztimer;
/* unpause */
ztimer->state &= ~ZTIMER_PAUSED;
ztime (&now);
/* calculate time since paused */
delta = now - ztimer->stop;
/* adjust start time to account for time elapsed since paused */
ztimer->start += delta;
}
static double
ZenTimerElapsed (ztimer_t *ztimer, uint64_t *usec)
{
#ifdef WIN32
static uint64_t freq = 0;
ztime_t delta, stop;
if (freq == 0)
QueryPerformanceFrequency ((LARGE_INTEGER *) &freq);
#else
#define freq ZTIME_USEC_PER_SEC
ztime_t delta, stop;
#endif
ztimer = ztimer ? ztimer : &__ztimer;
if (ztimer->state != ZTIMER_ACTIVE)
stop = ztimer->stop;
else
ztime (&stop);
delta = stop - ztimer->start;
if (usec != NULL)
*usec = (uint64_t) (delta * ((double) ZTIME_USEC_PER_SEC / (double) freq));
return (double) delta / (double) freq;
}
static void
ZenTimerReport (ztimer_t *ztimer, const char *oper)
{
fprintf (stderr, "ZenTimer: %s took %.6f seconds\n", oper, ZenTimerElapsed (ztimer, NULL));
}
#ifdef __cplusplus
}
#endif /* __cplusplus */
#else /* ! ENABLE_ZENTIMER */
#define ZenTimerStart(ztimerp)
#define ZenTimerStop(ztimerp)
#define ZenTimerPause(ztimerp)
#define ZenTimerResume(ztimerp)
#define ZenTimerElapsed(ztimerp, usec)
#define ZenTimerReport(ztimerp, oper)
#endif /* ENABLE_ZENTIMER */
#endif /* __ZENTIMER_H__ */
The ztime() function is the main logic you need — it gets the current time and stores it in a 64bit uint measured in microseconds. You can then later do simple math to find out the elapsed time.
The ZenTimer*() functions are just helper functions to take a pointer to a simple timer struct, ztimer_t, which records the start time and the end time. The ZenTimerPause()/ZenTimerResume() functions allow you to, well, pause and resume the timer in case you want to print out some debugging information that you don't want timed, for example.
You can find a copy of the original header file at http://www.gnome.org/~fejj/code/zentimer.h in the off chance that I messed up the html escaping of <'s or something. It's licensed under MIT/X11 so feel free to copy it into any project you do.
The following is a group of versatile C functions for timer management based on the gettimeofday() system call. All the timer properties are contained in a single ticktimer struct - the interval you want, the total running time since the timer initialization, a pointer to the desired callback you want to call, the number of times the callback was called. A callback function would look like this:
void your_timer_cb (struct ticktimer *t) {
/* do your stuff here */
}
To initialize and start a timer, call ticktimer_init(your_timer, interval, TICKTIMER_RUN, your_timer_cb, 0).
In the main loop of your program call ticktimer_tick(your_timer) and it will decide whether the appropriate amount of time has passed to invoke the callback.
To stop a timer, just call ticktimer_ctl(your_timer, TICKTIMER_STOP).
ticktimer.h:
#ifndef __TICKTIMER_H
#define __TICKTIMER_H
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#define TICKTIMER_STOP 0x00
#define TICKTIMER_UNCOMPENSATE 0x00
#define TICKTIMER_RUN 0x01
#define TICKTIMER_COMPENSATE 0x02
struct ticktimer {
u_int64_t tm_tick_interval;
u_int64_t tm_last_ticked;
u_int64_t tm_total;
unsigned ticks_total;
void (*tick)(struct ticktimer *);
unsigned char flags;
int id;
};
void ticktimer_init (struct ticktimer *, u_int64_t, unsigned char, void (*)(struct ticktimer *), int);
unsigned ticktimer_tick (struct ticktimer *);
void ticktimer_ctl (struct ticktimer *, unsigned char);
struct ticktimer *ticktimer_alloc (void);
void ticktimer_free (struct ticktimer *);
void ticktimer_tick_all (void);
#endif
ticktimer.c:
#include "ticktimer.h"
#define TIMER_COUNT 100
static struct ticktimer timers[TIMER_COUNT];
static struct timeval tm;
/*!
#brief
Initializes/sets the ticktimer struct.
#param timer
Pointer to ticktimer struct.
#param interval
Ticking interval in microseconds.
#param flags
Flag bitmask. Use TICKTIMER_RUN | TICKTIMER_COMPENSATE
to start a compensating timer; TICKTIMER_RUN to start
a normal uncompensating timer.
#param tick
Ticking callback function.
#param id
Timer ID. Useful if you want to distinguish different
timers within the same callback function.
*/
void ticktimer_init (struct ticktimer *timer, u_int64_t interval, unsigned char flags, void (*tick)(struct ticktimer *), int id) {
gettimeofday(&tm, NULL);
timer->tm_tick_interval = interval;
timer->tm_last_ticked = tm.tv_sec * 1000000 + tm.tv_usec;
timer->tm_total = 0;
timer->ticks_total = 0;
timer->tick = tick;
timer->flags = flags;
timer->id = id;
}
/*!
#brief
Checks the status of a ticktimer and performs a tick(s) if
necessary.
#param timer
Pointer to ticktimer struct.
#return
The number of times the timer was ticked.
*/
unsigned ticktimer_tick (struct ticktimer *timer) {
register typeof(timer->tm_tick_interval) now;
register typeof(timer->ticks_total) nticks, i;
if (timer->flags & TICKTIMER_RUN) {
gettimeofday(&tm, NULL);
now = tm.tv_sec * 1000000 + tm.tv_usec;
if (now >= timer->tm_last_ticked + timer->tm_tick_interval) {
timer->tm_total += now - timer->tm_last_ticked;
if (timer->flags & TICKTIMER_COMPENSATE) {
nticks = (now - timer->tm_last_ticked) / timer->tm_tick_interval;
timer->tm_last_ticked = now - ((now - timer->tm_last_ticked) % timer->tm_tick_interval);
for (i = 0; i < nticks; i++) {
timer->tick(timer);
timer->ticks_total++;
if (timer->tick == NULL) {
break;
}
}
return nticks;
} else {
timer->tm_last_ticked = now;
timer->tick(timer);
timer->ticks_total++;
return 1;
}
}
}
return 0;
}
/*!
#brief
Controls the behaviour of a ticktimer.
#param timer
Pointer to ticktimer struct.
#param flags
Flag bitmask.
*/
inline void ticktimer_ctl (struct ticktimer *timer, unsigned char flags) {
timer->flags = flags;
}
/*!
#brief
Allocates a ticktimer struct from an internal
statically allocated list.
#return
Pointer to the newly allocated ticktimer struct
or NULL when no more space is available.
*/
struct ticktimer *ticktimer_alloc (void) {
register int i;
for (i = 0; i < TIMER_COUNT; i++) {
if (timers[i].tick == NULL) {
return timers + i;
}
}
return NULL;
}
/*!
#brief
Marks a previously allocated ticktimer struct as free.
#param timer
Pointer to ticktimer struct, usually returned by
ticktimer_alloc().
*/
inline void ticktimer_free (struct ticktimer *timer) {
timer->tick = NULL;
}
/*!
#brief
Checks the status of all allocated timers from the
internal list and performs ticks where necessary.
#note
Should be called in the main loop.
*/
inline void ticktimer_tick_all (void) {
register int i;
for (i = 0; i < TIMER_COUNT; i++) {
if (timers[i].tick != NULL) {
ticktimer_tick(timers + i);
}
}
}
Using the time.h library, try something like this:
long start_time, end_time, elapsed;
start_time = clock();
// Do something
end_time = clock();
elapsed = (end_time - start_time) / CLOCKS_PER_SEC * 1000;
If your Linux system supports it, clock_gettime(CLOCK_MONOTONIC) should be a high resolution timer that is unaffected by system date changes (e.g. NTP daemons).
Great answers for GNU environments above and below...
But... what if you're not running on an OS? (or a PC for that matter, or you need to time your timer interrupts themselves?) Here's a solution that uses the x86 CPU timestamp counter directly... Not because this is good practice, or should be done, ever, when running under an OS...
Caveat: Only works on x86, with frequency scaling disabled.
Under Linux, only works on non-tickless kernels
rdtsc.c:
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned long long int64;
static __inline__ int64 getticks(void)
{
unsigned a, d;
asm volatile("rdtsc" : "=a" (a), "=d" (d));
return (((int64)a) | (((int64)d) << 32));
}
int main(){
int64 tick,tick1;
unsigned time=0,mt;
// mt is the divisor to give microseconds
FILE *pf;
int i,r,l,n=0;
char s[100];
// time how long it takes to get the divisors, as a test
tick = getticks();
// get the divisors - todo: for max performance this can
// output a new binary or library with these values hardcoded
// for the relevant CPU - if you use the equivalent assembler for
// that CPU
pf = fopen("/proc/cpuinfo","r");
do {
r=fscanf(pf,"%s",&s[0]);
if (r<0) {
n=5; break;
} else if (n==0) {
if (strcmp("MHz",s)==0) n=1;
} else if (n==1) {
if (strcmp(":",s)==0) n=2;
} else if (n==2) {
n=3;
};
} while (n<3);
fclose(pf);
s[9]=(char)0;
strcpy(&s[4],&s[5]);
mt=atoi(s);
printf("#define mt %u // (%s Hz) hardcode this for your a CPU-specific binary ;-)\n",mt,s);
tick1 = getticks();
time = (unsigned)((tick1-tick)/mt);
printf("%u ms\n",time);
// time the duration of sleep(1) - plus overheads ;-)
tick = getticks();
sleep(1);
tick1 = getticks();
time = (unsigned)((tick1-tick)/mt);
printf("%u ms\n",time);
return 0;
}
compile and run with
$ gcc rdtsc.c -o rdtsc && ./rdtsc
It reads the divisor for your CPU from /proc/cpuinfo and shows how long it took to read that in microseconds, as well as how long it takes to execute sleep(1) in microseconds... Assuming the Mhz rating in /proc/cpuinfo always contains 3 decimal places :-o

Thread execution time in C/Linux

Wandering if I can measure actual time or cpu ticks taken by a particular thread.
pthreadcreate(.........);
//
//
pthreadjoin(.......);
I am running with 3 threads.
One master thread is calling the rest two threads.
I want to measure the execution time for a called thread.
what should I use in linux environment ?
you can do one thing
In thread function at start up make a log using printk. You can separate it with different thread printing with it thread_t variable or thread index
and at end of that thread function put another log like that.
So in dmesg
will shows the log with timestamp
so you can differentiate end log time with start log time.
I know this is not more practical way of doing this but just for debugging purpose you can do this without much effort.
If you want to have a more accurate result you can use tick counters :
#ifndef TIMING_H
#define TIMING_H
/*
* -- Init timing library with timing_init();
* -- get timestamp :
* tick_t t;
* GET_TICK(t);
* -- get delay between two timestamps in microseconds :
* TIMING_DELAY(t1, t2);
*/
#include <sys/time.h>
#include <unistd.h>
#include <stdint.h>
#ifndef min
#define min(a,b) \
({__typeof__ ((a)) _a = (a); \
__typeof__ ((b)) _b = (b); \
_a < _b ? _a : _b; })
#endif
typedef union u_tick
{
uint64_t tick;
struct
{
uint32_t low;
uint32_t high;
}
sub;
} tick_t;
static double scale_time = 0.0;
static unsigned long long residual = 0;
#if defined(__i386__) || defined(__pentium__) || defined(__pentiumpro__) || defined(__i586__) || defined(__i686__) || defined(__k6__) || defined(__k7__) || defined(__x86_64__)
# define GET_TICK(t) __asm__ volatile("rdtsc" : "=a" ((t).sub.low), "=d" ((t).sub.high))
#else
# error "Unsupported processor"
#endif
#define TICK_RAW_DIFF(t1, t2) ((t2).tick - (t1).tick)
#define TICK_DIFF(t1, t2) (TICK_RAW_DIFF(t1, t2) - residual)
#define TIMING_DELAY(t1, t2) tick2usec(TICK_DIFF(t1, t2))
void timing_init(void)
{
static tick_t t1, t2;
int i;
residual = (unsigned long long)1 << 63;
for(i = 0; i < 20; i++)
{
GET_TICK(t1);
GET_TICK(t2);
residual = min(residual, TICK_RAW_DIFF(t1, t2));
}
{
struct timeval tv1,tv2;
GET_TICK(t1);
gettimeofday(&tv1,0);
usleep(500000);
GET_TICK(t2);
gettimeofday(&tv2,0);
scale_time = ((tv2.tv_sec*1e6 + tv2.tv_usec) -
(tv1.tv_sec*1e6 + tv1.tv_usec)) /
(double)(TICK_DIFF(t1, t2));
}
}
double tick2usec(long long t)
{
return (double)(t)*scale_time;
}
#endif /* TIMING_H */

How do I measure a time interval in C?

I would like to measure time in C, and I am having a tough time figuring it out, all I want is something like this:
start a timer
run a method
stop the timer
report the time taken (at least to micro accuracy)
Any help would be appreciated.
(I am compiling in windows using mingw)
High resolution timers that provide a resolution of 1 microsecond are system-specific, so you will have to use different methods to achieve this on different OS platforms. You may be interested in checking out the following article, which implements a cross-platform C++ timer class based on the functions described below:
[Song Ho Ahn - High Resolution Timer][1]
Windows
The Windows API provides extremely high resolution timer functions: QueryPerformanceCounter(), which returns the current elapsed ticks, and QueryPerformanceFrequency(), which returns the number of ticks per second.
Example:
#include <stdio.h>
#include <windows.h> // for Windows APIs
int main(void)
{
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
// start timer
QueryPerformanceCounter(&t1);
// do something
// ...
// stop timer
QueryPerformanceCounter(&t2);
// compute and print the elapsed time in millisec
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
printf("%f ms.\n", elapsedTime);
}
Linux, Unix, and Mac
For Unix or Linux based system, you can use gettimeofday(). This function is declared in "sys/time.h".
Example:
#include <stdio.h>
#include <sys/time.h> // for gettimeofday()
int main(void)
{
struct timeval t1, t2;
double elapsedTime;
// start timer
gettimeofday(&t1, NULL);
// do something
// ...
// stop timer
gettimeofday(&t2, NULL);
// compute and print the elapsed time in millisec
elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0; // sec to ms
elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0; // us to ms
printf("%f ms.\n", elapsedTime);
}
On Linux you can use clock_gettime():
clock_gettime(CLOCK_REALTIME, &start); // get initial time-stamp
// ... do stuff ... //
clock_gettime(CLOCK_REALTIME, &end); // get final time-stamp
double t_ns = (double)(end.tv_sec - start.tv_sec) * 1.0e9 +
(double)(end.tv_nsec - start.tv_nsec);
// subtract time-stamps and
// multiply to get elapsed
// time in ns
Here's a header file I wrote to do some simple performance profiling (using manual timers):
#ifndef __ZENTIMER_H__
#define __ZENTIMER_H__
#ifdef ENABLE_ZENTIMER
#include <stdio.h>
#ifdef WIN32
#include <windows.h>
#else
#include <sys/time.h>
#endif
#ifdef HAVE_STDINT_H
#include <stdint.h>
#elif HAVE_INTTYPES_H
#include <inttypes.h>
#else
typedef unsigned char uint8_t;
typedef unsigned long int uint32_t;
typedef unsigned long long uint64_t;
#endif
#ifdef __cplusplus
extern "C" {
#pragma }
#endif /* __cplusplus */
#define ZTIME_USEC_PER_SEC 1000000
/* ztime_t represents usec */
typedef uint64_t ztime_t;
#ifdef WIN32
static uint64_t ztimer_freq = 0;
#endif
static void
ztime (ztime_t *ztimep)
{
#ifdef WIN32
QueryPerformanceCounter ((LARGE_INTEGER *) ztimep);
#else
struct timeval tv;
gettimeofday (&tv, NULL);
*ztimep = ((uint64_t) tv.tv_sec * ZTIME_USEC_PER_SEC) + tv.tv_usec;
#endif
}
enum {
ZTIMER_INACTIVE = 0,
ZTIMER_ACTIVE = (1 << 0),
ZTIMER_PAUSED = (1 << 1),
};
typedef struct {
ztime_t start;
ztime_t stop;
int state;
} ztimer_t;
#define ZTIMER_INITIALIZER { 0, 0, 0 }
/* default timer */
static ztimer_t __ztimer = ZTIMER_INITIALIZER;
static void
ZenTimerStart (ztimer_t *ztimer)
{
ztimer = ztimer ? ztimer : &__ztimer;
ztimer->state = ZTIMER_ACTIVE;
ztime (&ztimer->start);
}
static void
ZenTimerStop (ztimer_t *ztimer)
{
ztimer = ztimer ? ztimer : &__ztimer;
ztime (&ztimer->stop);
ztimer->state = ZTIMER_INACTIVE;
}
static void
ZenTimerPause (ztimer_t *ztimer)
{
ztimer = ztimer ? ztimer : &__ztimer;
ztime (&ztimer->stop);
ztimer->state |= ZTIMER_PAUSED;
}
static void
ZenTimerResume (ztimer_t *ztimer)
{
ztime_t now, delta;
ztimer = ztimer ? ztimer : &__ztimer;
/* unpause */
ztimer->state &= ~ZTIMER_PAUSED;
ztime (&now);
/* calculate time since paused */
delta = now - ztimer->stop;
/* adjust start time to account for time elapsed since paused */
ztimer->start += delta;
}
static double
ZenTimerElapsed (ztimer_t *ztimer, uint64_t *usec)
{
#ifdef WIN32
static uint64_t freq = 0;
ztime_t delta, stop;
if (freq == 0)
QueryPerformanceFrequency ((LARGE_INTEGER *) &freq);
#else
#define freq ZTIME_USEC_PER_SEC
ztime_t delta, stop;
#endif
ztimer = ztimer ? ztimer : &__ztimer;
if (ztimer->state != ZTIMER_ACTIVE)
stop = ztimer->stop;
else
ztime (&stop);
delta = stop - ztimer->start;
if (usec != NULL)
*usec = (uint64_t) (delta * ((double) ZTIME_USEC_PER_SEC / (double) freq));
return (double) delta / (double) freq;
}
static void
ZenTimerReport (ztimer_t *ztimer, const char *oper)
{
fprintf (stderr, "ZenTimer: %s took %.6f seconds\n", oper, ZenTimerElapsed (ztimer, NULL));
}
#ifdef __cplusplus
}
#endif /* __cplusplus */
#else /* ! ENABLE_ZENTIMER */
#define ZenTimerStart(ztimerp)
#define ZenTimerStop(ztimerp)
#define ZenTimerPause(ztimerp)
#define ZenTimerResume(ztimerp)
#define ZenTimerElapsed(ztimerp, usec)
#define ZenTimerReport(ztimerp, oper)
#endif /* ENABLE_ZENTIMER */
#endif /* __ZENTIMER_H__ */
The ztime() function is the main logic you need — it gets the current time and stores it in a 64bit uint measured in microseconds. You can then later do simple math to find out the elapsed time.
The ZenTimer*() functions are just helper functions to take a pointer to a simple timer struct, ztimer_t, which records the start time and the end time. The ZenTimerPause()/ZenTimerResume() functions allow you to, well, pause and resume the timer in case you want to print out some debugging information that you don't want timed, for example.
You can find a copy of the original header file at http://www.gnome.org/~fejj/code/zentimer.h in the off chance that I messed up the html escaping of <'s or something. It's licensed under MIT/X11 so feel free to copy it into any project you do.
The following is a group of versatile C functions for timer management based on the gettimeofday() system call. All the timer properties are contained in a single ticktimer struct - the interval you want, the total running time since the timer initialization, a pointer to the desired callback you want to call, the number of times the callback was called. A callback function would look like this:
void your_timer_cb (struct ticktimer *t) {
/* do your stuff here */
}
To initialize and start a timer, call ticktimer_init(your_timer, interval, TICKTIMER_RUN, your_timer_cb, 0).
In the main loop of your program call ticktimer_tick(your_timer) and it will decide whether the appropriate amount of time has passed to invoke the callback.
To stop a timer, just call ticktimer_ctl(your_timer, TICKTIMER_STOP).
ticktimer.h:
#ifndef __TICKTIMER_H
#define __TICKTIMER_H
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#define TICKTIMER_STOP 0x00
#define TICKTIMER_UNCOMPENSATE 0x00
#define TICKTIMER_RUN 0x01
#define TICKTIMER_COMPENSATE 0x02
struct ticktimer {
u_int64_t tm_tick_interval;
u_int64_t tm_last_ticked;
u_int64_t tm_total;
unsigned ticks_total;
void (*tick)(struct ticktimer *);
unsigned char flags;
int id;
};
void ticktimer_init (struct ticktimer *, u_int64_t, unsigned char, void (*)(struct ticktimer *), int);
unsigned ticktimer_tick (struct ticktimer *);
void ticktimer_ctl (struct ticktimer *, unsigned char);
struct ticktimer *ticktimer_alloc (void);
void ticktimer_free (struct ticktimer *);
void ticktimer_tick_all (void);
#endif
ticktimer.c:
#include "ticktimer.h"
#define TIMER_COUNT 100
static struct ticktimer timers[TIMER_COUNT];
static struct timeval tm;
/*!
#brief
Initializes/sets the ticktimer struct.
#param timer
Pointer to ticktimer struct.
#param interval
Ticking interval in microseconds.
#param flags
Flag bitmask. Use TICKTIMER_RUN | TICKTIMER_COMPENSATE
to start a compensating timer; TICKTIMER_RUN to start
a normal uncompensating timer.
#param tick
Ticking callback function.
#param id
Timer ID. Useful if you want to distinguish different
timers within the same callback function.
*/
void ticktimer_init (struct ticktimer *timer, u_int64_t interval, unsigned char flags, void (*tick)(struct ticktimer *), int id) {
gettimeofday(&tm, NULL);
timer->tm_tick_interval = interval;
timer->tm_last_ticked = tm.tv_sec * 1000000 + tm.tv_usec;
timer->tm_total = 0;
timer->ticks_total = 0;
timer->tick = tick;
timer->flags = flags;
timer->id = id;
}
/*!
#brief
Checks the status of a ticktimer and performs a tick(s) if
necessary.
#param timer
Pointer to ticktimer struct.
#return
The number of times the timer was ticked.
*/
unsigned ticktimer_tick (struct ticktimer *timer) {
register typeof(timer->tm_tick_interval) now;
register typeof(timer->ticks_total) nticks, i;
if (timer->flags & TICKTIMER_RUN) {
gettimeofday(&tm, NULL);
now = tm.tv_sec * 1000000 + tm.tv_usec;
if (now >= timer->tm_last_ticked + timer->tm_tick_interval) {
timer->tm_total += now - timer->tm_last_ticked;
if (timer->flags & TICKTIMER_COMPENSATE) {
nticks = (now - timer->tm_last_ticked) / timer->tm_tick_interval;
timer->tm_last_ticked = now - ((now - timer->tm_last_ticked) % timer->tm_tick_interval);
for (i = 0; i < nticks; i++) {
timer->tick(timer);
timer->ticks_total++;
if (timer->tick == NULL) {
break;
}
}
return nticks;
} else {
timer->tm_last_ticked = now;
timer->tick(timer);
timer->ticks_total++;
return 1;
}
}
}
return 0;
}
/*!
#brief
Controls the behaviour of a ticktimer.
#param timer
Pointer to ticktimer struct.
#param flags
Flag bitmask.
*/
inline void ticktimer_ctl (struct ticktimer *timer, unsigned char flags) {
timer->flags = flags;
}
/*!
#brief
Allocates a ticktimer struct from an internal
statically allocated list.
#return
Pointer to the newly allocated ticktimer struct
or NULL when no more space is available.
*/
struct ticktimer *ticktimer_alloc (void) {
register int i;
for (i = 0; i < TIMER_COUNT; i++) {
if (timers[i].tick == NULL) {
return timers + i;
}
}
return NULL;
}
/*!
#brief
Marks a previously allocated ticktimer struct as free.
#param timer
Pointer to ticktimer struct, usually returned by
ticktimer_alloc().
*/
inline void ticktimer_free (struct ticktimer *timer) {
timer->tick = NULL;
}
/*!
#brief
Checks the status of all allocated timers from the
internal list and performs ticks where necessary.
#note
Should be called in the main loop.
*/
inline void ticktimer_tick_all (void) {
register int i;
for (i = 0; i < TIMER_COUNT; i++) {
if (timers[i].tick != NULL) {
ticktimer_tick(timers + i);
}
}
}
Using the time.h library, try something like this:
long start_time, end_time, elapsed;
start_time = clock();
// Do something
end_time = clock();
elapsed = (end_time - start_time) / CLOCKS_PER_SEC * 1000;
If your Linux system supports it, clock_gettime(CLOCK_MONOTONIC) should be a high resolution timer that is unaffected by system date changes (e.g. NTP daemons).
Great answers for GNU environments above and below...
But... what if you're not running on an OS? (or a PC for that matter, or you need to time your timer interrupts themselves?) Here's a solution that uses the x86 CPU timestamp counter directly... Not because this is good practice, or should be done, ever, when running under an OS...
Caveat: Only works on x86, with frequency scaling disabled.
Under Linux, only works on non-tickless kernels
rdtsc.c:
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned long long int64;
static __inline__ int64 getticks(void)
{
unsigned a, d;
asm volatile("rdtsc" : "=a" (a), "=d" (d));
return (((int64)a) | (((int64)d) << 32));
}
int main(){
int64 tick,tick1;
unsigned time=0,mt;
// mt is the divisor to give microseconds
FILE *pf;
int i,r,l,n=0;
char s[100];
// time how long it takes to get the divisors, as a test
tick = getticks();
// get the divisors - todo: for max performance this can
// output a new binary or library with these values hardcoded
// for the relevant CPU - if you use the equivalent assembler for
// that CPU
pf = fopen("/proc/cpuinfo","r");
do {
r=fscanf(pf,"%s",&s[0]);
if (r<0) {
n=5; break;
} else if (n==0) {
if (strcmp("MHz",s)==0) n=1;
} else if (n==1) {
if (strcmp(":",s)==0) n=2;
} else if (n==2) {
n=3;
};
} while (n<3);
fclose(pf);
s[9]=(char)0;
strcpy(&s[4],&s[5]);
mt=atoi(s);
printf("#define mt %u // (%s Hz) hardcode this for your a CPU-specific binary ;-)\n",mt,s);
tick1 = getticks();
time = (unsigned)((tick1-tick)/mt);
printf("%u ms\n",time);
// time the duration of sleep(1) - plus overheads ;-)
tick = getticks();
sleep(1);
tick1 = getticks();
time = (unsigned)((tick1-tick)/mt);
printf("%u ms\n",time);
return 0;
}
compile and run with
$ gcc rdtsc.c -o rdtsc && ./rdtsc
It reads the divisor for your CPU from /proc/cpuinfo and shows how long it took to read that in microseconds, as well as how long it takes to execute sleep(1) in microseconds... Assuming the Mhz rating in /proc/cpuinfo always contains 3 decimal places :-o

Resources