Is there is a simple but sure way to measure the relative differences in performance between two algorithm implementations in C programs. More specifically, I want to compare the performance of implementation A vs. B? I'm thinking of a scheme like this:
In a unit test program:
start timer
call function
stop timer
get difference between start stop time
Run the scheme above for a pair of functions A and B, then get a percentage difference in execution time to determine which is faster.
Upon doing some research I came across this question about using a Monotonic clock on OSX in C, which apparently can give me at least nanosecond precision. To be clear, I understand that precise, controlled measurements are hard to perform, like what's discussed in "With O(N) known and system clock known, can we calculate the execution time of the code?, which I assume should be irrelevant in this case because I only want a relative measurement.
Everything considered, is this a sufficient and valid approach towards the kind of analysis I want to perform? Are there any details or considerations I might be missing?
The main modification I make to the timing scheme you outline is to ensure that the same timing code is used for both functions — assuming they do have an identical interface, by passing a function pointer to skeletal code.
As an example, I have some code that times some functions that validate whether a given number is prime. The control function is:
static void test_primality_tester(const char *tag, int seed, int (*prime)(unsigned), int count)
{
srand(seed);
Clock clk;
int nprimes = 0;
clk_init(&clk);
clk_start(&clk);
for (int i = 0; i < count; i++)
{
if (prime(rand()))
nprimes++;
}
clk_stop(&clk);
char buffer[32];
printf("%9s: %d primes found (out of %d) in %s s\n", tag, nprimes,
count, clk_elapsed_us(&clk, buffer, sizeof(buffer)));
}
I'm well aware of srand() — why call it once?, but the point of using srand() once each time this function is called is to ensure that the tests process the same sequence of random numbers. On macOS, RAND_MAX is 0x7FFFFFFF.
The type Clock contain analogues to two struct timespec structures, for the start and stop time. The clk_init() function initializes the structure; clk_start() records the start time in the structure; clk_stop() records the stop time in the structure; and clk_elapsed_us() calculates the elapsed time between the start and stop times in microseconds. The package is written to provide me with cross-platform portability (at the cost of some headaches in determining which is the best sub-second timing routine available at compile time).
You can find my code for timers on Github in the repository https://github.com/jleffler/soq, in the src/libsoq directory — files timer.h and timer.c. The code has not yet caught up with macOS Sierra having clock_gettime(), though it could be compiled to use it with -DHAVE_CLOCK_GETTIME as a command-line compiler option.
This code was called from a function one_test():
static void one_test(int seed)
{
printf("Seed; %d\n", seed);
enum { COUNT = 10000000 };
test_primality_tester("IsPrime1", seed, IsPrime1, COUNT);
test_primality_tester("IsPrime2", seed, IsPrime2, COUNT);
test_primality_tester("IsPrime3", seed, IsPrime3, COUNT);
test_primality_tester("isprime1", seed, isprime1, COUNT);
test_primality_tester("isprime2", seed, isprime2, COUNT);
test_primality_tester("isprime3", seed, isprime3, COUNT);
}
And the main program can take one or a series of seeds, or uses the current time as a seed:
int main(int argc, char **argv)
{
if (argc > 1)
{
for (int i = 1; i < argc; i++)
one_test(atoi(argv[i]));
}
else
one_test(time(0));
return(0);
}
Related
I recently purchased a dev board for the max32660 MCU. I followed the instructions from the company's YouTube videos on how to set it up and got the example files working using the Eclipse IDE. I intend to use the time.h library to keep track of how long it takes to run various parts of my code, but whenever I try to printf() any clock_t type variables I just get back repeated 0.000000 values.
For the record: I am simply using my Arduino serial monitor to watch the signals come in from my USB port.
Maxim integrated youtube video:
https://www.youtube.com/watch?v=vpYBBYLkjTY&t=339s
Max32660 webpage
https://www.maximintegrated.com/en/products/microcontrollers/MAX32660.html
Code:
/***** Includes *****/
#include <stdio.h>
#include <stdint.h>
#include "mxc_device.h"
#include "led.h"
#include "board.h"
#include "mxc_delay.h"
#include "time.h"
/***** Definitions *****/
/***** Globals *****/
clock_t start;
clock_t end;
/***** Functions *****/
// *****************************************************************************
int main(void)
{
printf("Hello World!\n");
while (1) {
start = clock();
MXC_Delay(500000); //This part works fine
end = clock();
printf("%Lf\n", start); //I've tried many different versions of this
}
}
I've tried many different printf() configurations (%f, %lf, %ld, %s, %d, %i) and I cannot tell if the variables "start" and "end" are actually being saved as that value or if there is some problem with my serial port monitor reading the printf() statement, even though the "Hello World!" statement prints just fine.
I would expect to see any number besides 0.000000 being returned, particularly some integer that represents the number of clock cycles. I also expect that number to be somewhat low since the call to clock() is early in the script.
In most embedded standard libraries time() and clock() are not implemented or implemented as a "weak-link" stubs. You normally have to implement them yourself. In this case clock() and CLOCKS_PER_SEC are implementation dependent, you need to "re-targetting" the library to map this function to the hardware resources you choose to assign to that function. On a Cortex-M4 that would normally be a counter incremented in the SYSTICK interrupt at whatever rate (CLOCKS_PER_SEC) you choose for your platform.
You may already have support for SYSTICK and a system clock; in which case your clock() implementation might simply be a facade for that existing functionality.
Even if you had a working clock() the code:
printf("%Lf\n", start);
Is unlikely to produce a useful result in any case:
start is of type clock_t which is likely an alias for long, but certainly not long double.
Since any clock() implementation will be synchronous with whatever clock is used for MXC_Delay() the test simply measures itself!
end is unused so you are simply showin elapsed time, not the iteratuion time (that may be intended, in which case end is redundant). Perhaps you intended:
printf("%ld\n", (long)(end - start) ) ;
or possibly:
printf("%f\n", (double)(end - start) / CLOCKS_PER_SEC ) ;
I would doubt that on this target that long double is any larger than double in any event.
I am trying to read data from the ADC in the Beagle Bone, running Angstrom Linux distribution. I need to use a delay mechanism like sleep(), to only read samples at specific time, to help conform to a specific sample rate.
I also am required to calculate the execution time.
Here is a sample POC (proof of concept), to demonstrate the problem I am facing:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main()
{
clock_t begin, end;
while(1)
{
begin = clock();
sleep(1); // delay for 1 sec
end = clock();
printf("Execution time = %f\n",((float)(end-begin)/CLOCKS_PER_SEC));
}
}
I always get the execution time as 0.000000.
Why do I not get my result as 1.000000 seconds? My guess is calling sleep() will pre-empt my program, but I am not sure.
What other option do I have to calculate elapsed execution time which includes a delay?
Unix
The original versions of Unix didn't support sub-second timing, and only the latest versions of the C standard support sub-second 'real time' (aka 'wall time' or 'elapsed time'). At some point, ftime() was added, then gettimeofday(), then clock_gettime(), and the clock() function was standardized in C90.
The C standard, and Unix since at least 7th Edition UNIX™, has provided single-second accuracy with the time() function:
time_t now = time(0);
printf("%ld\n", (long)now);
7th Edition UNIX™ (or Version 7 UNIX™) provided millisecond resolution with ftime() — and it was included in POSIX up to and including the 2004 version (ftime()), but it is not part of POSIX 2008 or later, though it will still be supported on some machine types for reasons of backwards compatibility.
#include <sys/timeb.h>
struct timeb tb;
if (ftime(&tb) == 0)
printf("%ld.%.3d\n", (long)tb.time, tb.millitm);
POSIX also provided (and still provides) microsecond resolution timing via struct timeval and gettimeofday(). It is marked obsolescent in POSIX 2008. It may be the most portable timer.
#include <sys/time.h>
struct timeval tv;
if (gettimeofday(&tv, NULL) == 0)
printf("%ld.%.6d\n", (long)tv.tv_sec, tv.tv_usec);
Note that there are caveats about using gettimeofday() — its result can be affected if someone adjusts the system between successive calls. Similar comments apply to clock_gettime() and CLOCK_REALTIME.
POSIX is moving towards using nanosecond resolution timing, which is provided via struct timespec and clock_gettime(). With clock_gettime(), you must choose which clock you wish to measure. For many purposes, CLOCK_REALTIME is the correct choice (but CLOCK_MONOTONIC may be better for some purposes, if it is available).
#include <time.h>
struct timespec ts;
if (clock_gettime(CLOCK_REALTIME, &ts) == 0)
printf("%ld.%.9d\n", (long)ts.tv_sec, ts.tv_nsec);
I'm not sure exactly where clock() came from. It wasn't in 7th Edition Unix, but it was in SVR4 with 100 for CLOCKS_PER_SEC. The C standard provides the clock() function (the POSIX specification for (clock() requires 1,000,000 for CLOCKS_PER_SEC; C does not). Note that this does not measure elapsed 'wall time'; it measures an approximation to the amount of CPU time that a process has used.
The clock() function shall return the implementation's best approximation to the processor time used by the process since the beginning of an implementation-defined era related only to the process invocation.
To determine the time in seconds, the value returned by clock() should be divided by the value of the macro CLOCKS_PER_SEC. [XSI] ⌦ CLOCKS_PER_SEC is defined to be one million in <time.h>. ⌫
clock_t clk = clock();
printf("%.6f\n", (double)clk / CLOCKS_PER_SEC);
Since the clock() function measures the CPU time used, not the wall clock time that elapses, it is wholly inappropriate to measure elapsed time when sleep() is used because a sleeping process uses no CPU time.
The C11 standard provides some thread-related functions which use struct timespec, which is defined to match the POSIX definition of the type. It also provides the function timespec_get():
7.27.2.5 The timespec_get function
#include <time.h>
int timespec_get(struct timespec *ts, int base);
¶2 The timespec_get function sets the interval pointed to by ts to hold the current calendar time based on the specified time base.
¶3 If base is TIME_UTC, the tv_sec member is set to the number of seconds since an implementation defined epoch, truncated to a whole value and the tv_nsec member is set to the integral number of nanoseconds, rounded to the resolution of the system clock.321)
¶4 If the timespec_get function is successful it returns the nonzero value base; otherwise, it returns zero.
321) Although a struct timespec object describes times with nanosecond resolution, the available resolution is system dependent and may even be greater than 1 second.
This function may not be widely available yet. It is not available on macOS 10.14.5 Mojave, for example, though it seems to be available in OpenBSD. It may be available in glibc (GNU C Library) but it isn't listed in the Linux man pages (neither in section 2 system calls or section 3 functions at https://linux.die.net, nor in section 2 system calls or section 3 functions at https://man7.org/). Equally clearly, it's trivial to implement a decent approximation to it if clock_gettime()is available:
#if !defined(HAVE_TIMESPEC_GET) && defined(HAVE_CLOCK_GETTIME)
enum { TIME_UTC = 1 };
static inline int timespec_get(struct timespec *ts, int base)
{
assert(base != 0);
if (clock_gettime(CLOCK_REALTIME, ts) != 0)
return 0;
return base;
}
#endif /* HAVE_TIMESPEC_GET */
Windows
Windows provides alternative interfaces, including GetTickCount() which returns a value in milliseconds since a reference time (valid up to 49 days) and QueryPerformanceCounter(). You may also find references to, and uses for, RDTSC — an Intel CPU instruction.
Printing time_t values
Note that throughout this code I've assumed that time_t values can be printed by converting the value to long and using the %ld format. That is correct for 64-bit Unix systems. It is correct for Windows 64-bit, and also for both Unix and Windows 32-bit systems until January 2038, when the value for the gap between the current time and 'The (Unix) Epoch' (1970-01-01 00:00:00 +00:00) grows bigger than 232 - 1 seconds — 32-bit arithmetic overflows. Then the cast type should be long long (which is guaranteed to be at least a 64-bit type) and the format should be %lld. The only issue (and reason for not doing it now) is the formats that MS Visual Studio supports — it uses non-standard names and format specifiers for 64-bit types, as I understand it.
Calculating elapsed time involves calculating differences between two values returned by these functions. Working with the structures is mildly fiddly; you have to deal with overflows or underflows when subtracting.
The solution to calculate execution time is to get the timestamp at the beginning of your program and at the end. Then make the difference.
#include <stdio.h>
#include <time.h>
int main() {
time_t begin;
time(&begin);
// Somethings
time_t end;
time(&end);
printf("Execution time %f\n", difftime(end, begin));
return (0);
}
EDIT:
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
int main() {
struct timeval tv;
gettimeofday(&tv, NULL);
double begin =
(tv.tv_sec) * 1000 + (tv.tv_usec) / 1000 ;
sleep(2);
gettimeofday(&tv, NULL);
double end =
(tv.tv_sec) * 1000 + (tv.tv_usec) / 1000 ;
printf("Execution time %f\n", end - begin);
return (0);
}
Indeed: during sleep(), the program doesn't run at all. And as clock() counts the CPU time and not the wall clock time, "no time passes".
Use time() instead.
clock_gettime is the function to use for this purpose. Do not use gettimeofday() ... It is deprecated and there is guidance AGAINST using it from the Opengroup and in Linux man pages.
<<
I use MACRO to print the elapse time.
#include <stdio.h>
#include <sys/time.h>
#include <unistd.h>
#define ELPS_TIME(func_name) do { \
struct timeval tval_before, tval_after, tval_result; \
gettimeofday(&tval_before, NULL); \
func_name; \
gettimeofday(&tval_after, NULL); \
timersub(&tval_after, &tval_before, &tval_result); \
printf("Time elapsed: %ld.%06ld seconds\n", \
(long int)tval_result.tv_sec, \
(long int)tval_result.tv_usec); } while(0)
static void test_func1() {
printf("%s:", __FUNCTION__);
sleep(1);
}
static void test_func2() {
printf("%s:", __FUNCTION__);
sleep(2);
}
int main() {
ELPS_TIME(test_func1()); //calling test_func1
ELPS_TIME(test_func2()); //calling test_func2
return 0;
}
Output:
test_func1:Time elapsed: 1.000103 seconds
test_func2:Time elapsed: 2.000974 seconds
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have this following code, which I run in the terminal.
In another terminal i have 'top' open, where i am able to see the %CPU of the new process i create. I run this for the number of processes (N); 2, 4, 8, 16.
The average %CPU from each i report back is..
2 - 100%
4 - 97%
8 - 50%
16 - 25%
How can the processing power of the computer be determined by these results?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define N 2 /* define the total number of processes we want */
/* Set global variable */
float total=0;
/* compute function just does something. */
int compute()
{
int i;
float oldtotal=0, result=0;
/* for a large number of times just square root and square
the arbitrary number 1000 */
for(i=0;i<2000000000;i++)
{
result=sqrt(1000.0)*sqrt(1000.0);
}
/* Print the result – should be no surprise */
printf("Result is %f\n",result);
/* We want to keep a running total in the global variable total */
oldtotal = total;
total = oldtotal + result;
/* Print running total so far. */
printf("Total is %f\n",total);
return(0);
}
int main()
{
int pid[N], i, j;
float result=0;
printf("\n"); /* bit of whitespace */
/* We want to loop to create the required number of processes
Note carefully how only the child process is left to run */
for(i=0;i<N;i++)
{
/* Do the fork and catch it if it/one fails */
if((pid[i]=fork())==-1)
{
exit(1);
}
/* now with child we want to do our computation */
else if(pid[i] > 0)
{
/* give a message about the proc ID */
printf("Process Id for process %d is %d\n",i,getpid());
/* call the function to do some computation. If we used sleep
The process would simply sleep. We do not want that */
compute();
/* After we have done our computation we must quit the for
loop otherwise we get a fork bomb! */
break;
}
}
/* nothing else to do so end main function (and program) */
return 0;
}
It depends on your definition of processing power. The classic way is number of instructions per second (MIPS) or floating point operations per second (FLOPs).
Finding MIPS is fiddly because in C code you don't know how many instructions each line of code represents.
You can do a mega-flops calculation though. Loop in C doing a float * float operation of random numbers. See how long it takes to do a lot of calculations (say 109) then calculate how many you did in a second.
Then multiply by the number of processors you have.
Your results show very less difference between the CPU usages for 2 processes and 4 processes so it is almost certainly a quad core processor. Apart from that, not much can be told about the about the speed of the processor with just the percentages. You have also used printf statements which make it even harder to calculate the processing speed since they occasionally flush the buffer.
"Processing power" is a suggestive, yet annoyingly vague term. For most purposes, we don't care about MIPs or FIPs—only for pure number crunching do we give it any mind. Even then, it is as useless a measure for comparing computers as it is to compare automobiles with their maximum RPMs. See BogoMips.
Even with MIPs and FIPs, usually peak performance is measured. Many don't try to determine average values.
Another useful parameter of CPU power is branches per second—which no one measures (but they should). BPS varies considerably based on the intricacies of the CPU architecture, caching, paging, perhaps context switching, and the nature of the branches.
In any useful program, input and output is part of computing power. Hence bandwidth of memory, i/o devices, file systems, and connecting devices, networks, etc. are part of the computer's "computing power".
If you mean a subset of these, please clarify your question.
I need to use an atomic variable in C as this variable is accessed across different threads. Don't want a race condition.
My code is running on CentOS. What are my options?
C11 atomic primitives
http://en.cppreference.com/w/c/language/atomic
_Atomic const int * p1; // p is a pointer to an atomic const int
const atomic_int * p2; // same
const _Atomic(int) * p3; // same
Added in glibc 2.28. Tested in Ubuntu 18.04 (glibc 2.27) by compiling glibc from source: Multiple glibc libraries on a single host Later also tested on Ubuntu 20.04, glibc 2.31.
Example adapted from: https://en.cppreference.com/w/c/language/atomic
main.c
#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>
atomic_int acnt;
int cnt;
int f(void* thr_data)
{
(void)thr_data;
for(int n = 0; n < 1000; ++n) {
++cnt;
++acnt;
// for this example, relaxed memory order is sufficient, e.g.
// atomic_fetch_add_explicit(&acnt, 1, memory_order_relaxed);
}
return 0;
}
int main(void)
{
thrd_t thr[10];
for(int n = 0; n < 10; ++n)
thrd_create(&thr[n], f, NULL);
for(int n = 0; n < 10; ++n)
thrd_join(thr[n], NULL);
printf("The atomic counter is %u\n", acnt);
printf("The non-atomic counter is %u\n", cnt);
}
Compile and run:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
Possible output:
The atomic counter is 10000
The non-atomic counter is 8644
The non-atomic counter is very likely to be smaller than the atomic one due to racy access across threads to the non atomic variable.
Disassembly analysis at: How do I start threads in plain C?
If you are using GCC on your CentOS platform, then you can use the __atomic built-in functions.
Of particular interest might be this function:
— Built-in Function: bool __atomic_always_lock_free (size_t size, void *ptr)
This built-in function returns true if objects of size bytes always generate lock free atomic instructions for the target architecture. size must resolve to a compile-time constant and the result also resolves to a compile-time constant.
ptr is an optional pointer to the object that may be used to determine alignment. A value of 0 indicates typical alignment should be used. The compiler may also ignore this parameter.
if (_atomic_always_lock_free (sizeof (long long), 0))
I am going to toss in my two cents in case someone benefits. Atomic operations are a major problem in Linux. I used gatomic.h at one time only to find it gone. I see all kinds of different atomic options of either questionable reliability or availability -- and I see things changing all the time. They can be complex with tests needed by O/S level, processor, whatever. You can use a mutex -- not only complex by dreadfully slow.
Although perhaps not ideal in threads, this works great for atomic operations on shared memory variables. It is simple and it works on every O/S and processor and configuration known to man (or woman), dead reliable, easy to code, and will always work.
Any code can me made atomic with a simple primitive -- a semaphore. It is something that is true/false, 1/0, yes/no, locked/unlocked -- binary.
Once you establish the semaphore:
set semaphore //must be atomic
do all the code you like which will be atomic as the semaphore will block for you
release semaphore //must be atomic
Relatively straight forward except the "must be atomic" lines.
It turns out that you easily assign your semaphores a number (I use a define so they have a name like "#define OPEN_SEM 1" and "#define "CLASS_SEM 2" and so forth.
Find out your largest number and when your program initializes open a file in some directory (I use one just for this purpose). If not there create it:
if (ablockfd < 0) { //ablockfd is static in case you want to
//call it over and over
char *get_sy_path();
char lockname[100];
strcpy(lockname, get_sy_path());
strcat(lockname, "/metlock");
ablockfd = open(lockname, O_RDWR);
//error code if ablockfd bad
}
Now to gain a semaphore:
Now use your semaphore number to "lock" a "record" in your file of length one byte. Note -- the file will never actually occupy disk space and no disk operation occurs.
//sem_id is passed in and is set from OPEN_SEM or CLASS_SEM or whatever you call your semaphores.
lseek(ablockfd, sem_id, SEEK_SET); //seeks to the bytes in file of
//your semaphore number
result = lockf(ablockfd, F_LOCK, 1);
if (result != -1) {
//got the semaphore
} else {
//failed
}
To test if the semaphore is held:
result = lockf(ablockfd, F_TEST, 1); //after same lseek
To release the semaphore:
result = lockf(ablockfd, F_ULOCK, 1); //after same lseek
And all the other things you can do with lockf -- blocking/non-blocking, etc.
Note -- this is WAY faster than a mutex, it goes away if the process dies (a good thing), simple to code, and I know of no operating system with any processor with any number of them or number of cores that cannot atomically lock a record ... so simple code that just works. The file never really exists (no bytes but in directory), seems to be no practical limit to how many you may have. I have used this for years on machines with no easy atomic solutions.
I would like to measure the system time it takes to execute some code. To do this I know I would sandwich said code between two calls to getrusage(), but I get some unexpected results...
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>
#include <stdio.h>
int main() {
struct rusage usage;
struct timeval start, end;
int i, j, k = 0;
getrusage(RUSAGE_SELF, &usage);
start = usage.ru_stime;
for (i = 0; i < 10000; i++) {
/* Double loop for more interesting results. */
for (j = 0; j < 10000; j++) {
k += 20;
}
}
getrusage(RUSAGE_SELF, &usage);
end = usage.ru_stime;
printf("Started at: %ld.%lds\n", start.tv_sec, start.tv_usec);
printf("Ended at: %ld.%lds\n", end.tv_sec, end.tv_usec);
return 0;
}
I would hope that this produces two different numbers, but alas! After seeing my computer think for a second or two, this is the result:
Started at: 0.1999s
Ended at: 0.1999s
Am I not using getrusage() right? Why shouldn't these two numbers be different? If I am fundamentally wrong, is there another way to use getrusage() to measure the system time of some source code? Thank you for reading.
You're misunderstanding the difference between "user" and "system" time. Your example code is executing primarily in user-mode (ie, running your application code) while you are measuring, but "system" time is a measure of time spent executing in kernel-mode (ie, processing system calls).
ru_stime is the correct field to measure system time. Your test application just happens not to accrue any such time between the two points you check.
You should use usage.ru_utime, which is user CPU time used, instead.
Use gprof. This will give give you the time taken by each function.
Install gprof and use these flags for compilation: -pg -fprofile-arcs -ftest-coverage.