I am working with an averaging function following the formula
new average = old average * (n-1) / n + (new value / n)
When passing in doubles this works great. My example code for a proof of concept is as follows.
double avg = 0;
uint16_t i;
for(i=1; i<10; i++) {
int32_t new_value = i;
avg = avg*(i-1);
avg /= i;
avg += new_value/i;
printf("I %d New value %d Avg %f\n",i, new_value, avg);
}
In my program I am keeping track of messages received. Each time I see a message its hit count is increased by 1, it is them timestamped using a timespec. My goal is to keep a moving average (like above) of the average time between messages of a certain type being received.
My initial attempt was to average the tv_nsec and tv_sec separately as follows
static int32_t calc_avg(const int32_t current_avg, const int32_t new_value, const uint64_t n) {
int32_t new__average = current_avg;
new__average = new__average*(n-1);
new__average /= n;
new__average += new_value/n;
return new__average;
}
void average_timespec(struct timespec* average, const struct timespec new_sample, const uint64_t n) {
if(n > 0) {
average->tv_nsec = calc_avg(average->tv_nsec, new_sample.tv_nsec, n);
average->tv_sec = calc_avg(average->tv_sec, new_sample.tv_sec, n);
}
}
My issue is I am using integers, the values are always rounded down and my averages are way off. Is there a smarter/easier way to average the time between timespec readings?
Below is some code that I've used a lot [in production S/W] for years.
The main idea is that just because clock_gettime uses struct timespec does not mean this has to be "carried around" everywhere:
It's easier to convert to a long long or double and propagate those values as soon as they're gotten from clock_gettime.
All further math is simple add/subtract, etc.
The overhead of the clock_gettime call dwarfs the multiply/divide time in the conversion.
Whether I use the fixed nanosecond value or the fractional seconds value depends upon the exact application.
In your case, I'd probably use the double since you already have calculations that work for that.
Anyway, this is what I use:
#include <time.h>
typedef long long tsc_t; // timestamp in nanoseconds
#define TSCSEC 1000000000LL
#define TSCSECF 1e9
tsc_t tsczero; // initial start time
double tsczero_f; // initial start time
// tscget -- get number of nanoseconds
tsc_t
tscget(void)
{
struct timespec ts;
tsc_t tsc;
clock_gettime(CLOCK_MONOTONIC,&ts);
tsc = ts.tv_sec;
tsc *= TSCSEC;
tsc += ts.tv_nsec;
tsc -= tsczero;
return tsc;
}
// tscgetf -- get fractional number of seconds
double
tscgetf(void)
{
struct timespec ts;
double sec;
clock_gettime(CLOCK_MONOTONIC,&ts);
sec = ts.tv_nsec;
sec /= TSCSECF;
sec += ts.tv_sec;
sec -= tsczero_f;
return sec;
}
// tscsec -- convert tsc value to [fractional] seconds
double
tscsec(tsc_t tsc)
{
double sec;
sec = tsc;
sec /= TSCSECF;
return sec;
}
// tscinit -- initialize base time
void
tscinit(void)
{
tsczero = tscget();
tsczero_f = tscsec(tsczero);
}
Use better integer math.
Use signed math if new_value < 0 possible, else int64_t cast not needed below.
Form the sum first and then divide.
Round.
Sample code:
// new__average = new__average*(n-1);
// new__average /= n;
// new__average += new_value/n;
// v-------------------------------------v Add first
new__average = (new__average*((int64_t)n-1) + new_value + n/2)/n;
// Add n/2 to effect rounding ^-^
On review, the whole idea of doing averages in 2 parts is flawed. Instead use a 64-bit count of nanoseconds. Good until the year 2263.
Suggested code:
void average_timespec(int64_t* average, struct timespec new_sample, int64_t n) {
if (n > 0) {
int64_t t = new_sample.tv_sec + new_sample.tv_nsec*(int64_t)1000000000;
*average = (*average*(n-1) + t + n/2)/n;
}
}
If you must form a struct timespec from the average, easy to do when average >= 0.
int64_t average;
average_timespec(&average, new_sample, n);
struct timespec avg_ts = (struct timespec){.tm_sec = average/1000000000,
.tm_nsec = average%1000000000);
Related
I am writing a raycaster, and I am trying to speed it up by making lookup tables for my most commonly called trig functions, namely sin, cos, and tan. This first snippet is my table lookup code. In order to avoid making a lookup table for each, I am just making one sin table, and defining cos(x) as sin(half_pi - x) and tan(x) as sin(x) / cos(x).
#include <math.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
const float two_pi = M_PI * 2, half_pi = M_PI / 2;
typedef struct {
int fn_type, num_vals;
double* vals, step;
} TrigTable;
static TrigTable sin_table;
TrigTable init_trig_table(const int fn_type, const int num_vals) {
double (*trig_fn) (double), period;
switch (fn_type) {
case 0: trig_fn = sin, period = two_pi; break;
case 1: trig_fn = cos, period = two_pi; break;
case 2: trig_fn = tan, period = M_PI; break;
}
TrigTable table = {fn_type, num_vals,
calloc(num_vals, sizeof(double)), period / num_vals};
for (double x = 0; x < period; x += table.step)
table.vals[(int) round(x / table.step)] = trig_fn(x);
return table;
}
double _lookup(const TrigTable table, const double x) {
return table.vals[(int) round(x / table.step)];
}
double lookup_sin(double x) {
const double orig_x = x;
if (x < 0) x = -x;
if (x > two_pi) x = fmod(x, two_pi);
const double result = _lookup(sin_table, x);
return orig_x < 0 ? -result : result;
}
double lookup_cos(double x) {
return lookup_sin(half_pi - x);
}
double lookup_tan(double x) {
return lookup_sin(x) / lookup_cos(x);
}
Here is how I went about benchmarking my code: my function for the current time in milliseconds is from here. The problem arises here: when timing my lookup_sin vs math.h's sin, my variant takes around three times longer: Table time vs default: 328 ms, 108 ms.
Here is the timing for cos:
Table time vs default: 332 ms, 109 ms
Here is the timing for tan:
Table time vs default: 715 ms, 153 ms
What makes my code so much slower? I would think that precomputing sin values would greatly accelerate my code. Perhaps it's the fmod in the lookup_sin function? Please provide whatever insight that you have. I am compiling with clang with no optimizations enabled, so that the calls to each trig function are not removed (I am ignoring the return value).
const int64_t millis() {
struct timespec now;
timespec_get(&now, TIME_UTC);
return ((int64_t) now.tv_sec) * 1000 + ((int64_t) now.tv_nsec) / 1000000;
}
const int64_t benchmark(double (*trig_fn) (double)) {
const int64_t before = millis();
for (double i = 0; i < 10000; i += 0.001)
trig_fn(i);
return millis() - before;
}
int main() {
sin_table = init_trig_table(0, 15000);
const int64_t table_time = benchmark(lookup_sin), default_time = benchmark(sin);
printf("Table time vs default: %lld ms, %lld ms\n", table_time, default_time);
free(sin_table.vals);
}
Reduce the floating point math.
OP's code is doing excessive FP math in what should be a scale and lookup.
Scale the radians per a pre-computed factor into an index.
The number of entries in the lookup table should be an unsigned power-of-2 so the mod is a simple &.
At first, let us simplify and have [0 ... 2*pi) map to indexes [0 ... number_of_entries) to demo the idea.
double lookup_sin_alt(double x) {
long scaled_x = lround(x * scale_factor); // This should be the _only_ line of FP code
// All following code is integer code.
scaled_x += number_of_entries/4 ; // If we are doing cosine
unsigned index = scaled_x & (number_of_entries - 1); // This & replaces fmod
double result = table.vals[index];
return result;
}
Later we can use a quarter size table [0 ... pi/2] and steer selection/reconstruction with integer operations.
Given OP's low precision requirements, consider using float instead of double throughout including float functions like lroundf().
I'm sure the answer is simple, but I don't quite get it. I'm trying to calculate the delta between two struct timespec using this code:
struct timespec start, finish, diff;
int ndiff;
/* Structs are filled somewhere else */
diff.tv_sec = finish.tv_sec - start.tv_sec;
ndiff = finish.tv_nsec - start.tv_nsec;
if (ndiff < 0) {
diff.tv_sec--;
ndiff = 1L - ndiff;
}
diff.tv_nsec = ndiff;
printf("Elapsed time: %ld.%ld seconds.\n", diff.tv_sec, diff.tv_nsec);
However, the output is always something like Elapsed time: 0.300876000 seconds. which seems to indicate that I'm losing the last three digits of the nanoseconds (since those shouldn't always be zero). Can someone point out what's causing that?
Elapsed time: 0.300876000 seconds. which seems to indicate that I'm losing the last three digits of the nanoseconds (since those shouldn't always be zero). Can someone point out what's causing that?
The code's clock reported precision is 1000 ns. #John Bollinger #rici
and/or
diff.tv_sec is not necessarily a long. Use a matching specifier.
// printf("Elapsed time: %ld.%ld seconds.\n", diff.tv_sec, diff.tv_nsec);
// Also insure fraction is printed with 9 digits
printf("Elapsed time: %lld.%09ld seconds.\n", (long long) diff.tv_sec, diff.tv_nsec);
Also, incorrect "borrow" math when updating the ndiff.
ndiff = finish.tv_nsec - start.tv_nsec;
if (ndiff < 0) {
diff.tv_sec--;
// ndiff = 1L - ndiff;
ndiff += 1000000000;
}
Even better, drop the int diff variable.
diff.tv_sec = finish.tv_sec - start.tv_sec;
diff.tv_nsec = finish.tv_nsec - start.tv_nsec;
if (diff.tv_nsec < 0) {
diff.tv_sec--;
diff.tv_nsec += 1000000000;
}
Should finish occur before start, then other code may be desired to keep the 2 members of diff with the same sign.
I am collecting the total elapsed time by using two inline functions (specified and implemented in my .h file) as follows:
extern double _elapsed_time_mf;
extern double _elapsed_time_b;
//this function returns the elapsed time in order to compute the total elapsed time of an operation
static inline struct timeval get_current_time() {
struct timeval time;
gettimeofday(&time, NULL);
return time;
}
//calculate the total processed time and return the elapsed total time in seconds
static inline double get_elapsed_time(struct timeval start, struct timeval end) {
long int tmili;
tmili = (int) (1000.0 * (end.tv_sec - start.tv_sec) +
(end.tv_usec - start.tv_usec) / 1000.0);
return (double) (tmili / (double) 1000.0);
}
Then, when I would like to know the total elapsed time of an operation I do this:
void my_function() {
#ifdef COLLECT_STATISTICAL_DATA
struct timeval start;
struct timeval end;
start = get_current_time();
#endif
//a processing....
#ifdef COLLECT_STATISTICAL_DATA
end = get_current_time();
_elapsed_time_mf = get_elapsed_time(start, end);
#endif
}
_elapsed_time_mf is defined in only one .c file.
However, I am getting strange results. For instance, consider that I have another function, called function_b, which also collects its elapsed time (which is stored in other global variable). Then, this function makes a call to my_function (that collects its elapsed time according to my previous code). However, the total elapsed time of function_b is sometimes lesser than the total elapsed time of my_function. An example of this situations is:
void function_b() {
#ifdef COLLECT_STATISTICAL_DATA
struct timeval start;
struct timeval end;
start = get_current_time();
#endif
//a processing....
my_function();
//another processing...
#ifdef COLLECT_STATISTICAL_DATA
end = get_current_time();
_elapsed_time_b = get_elapsed_time(start, end);
#endif
}
Sometimes _elapsed_time_b is lesser than _elapsed_time_mf. Why?
I would like to collect both elapsed times in seconds according to the clock/date/timestamp (not the CPU elapsed time).
You might want to reconsider the implementation of get_elapsed_time. From here: http://www.gnu.org/software/libc/manual/html_node/Elapsed-Time.html
int timeval_subtract (struct timeval *result, struct timeval *x, struct timeval *y)
{
/* Perform the carry for the later subtraction by updating y. */
if (x->tv_usec < y->tv_usec) {
int nsec = (y->tv_usec - x->tv_usec) / 1000000 + 1;
y->tv_usec -= 1000000 * nsec;
y->tv_sec += nsec;
}
if (x->tv_usec - y->tv_usec > 1000000) {
int nsec = (x->tv_usec - y->tv_usec) / 1000000;
y->tv_usec += 1000000 * nsec;
y->tv_sec -= nsec;
}
/* Compute the time remaining to wait.
tv_usec is certainly positive. */
result->tv_sec = x->tv_sec - y->tv_sec;
result->tv_usec = x->tv_usec - y->tv_usec;
/* Return 1 if result is negative. */
return x->tv_sec < y->tv_sec;
}
As Art has commented, I am using now clock_gettime. Thus, my code is now working as expected.
My functions are now written as:
static inline double get_elapsed_time(struct timespec start, struct timespec end) {
double start_in_sec = (double)start.tv_sec + (double)start.tv_nsec / 1000000000.0;
double end_in_sec = (double)end.tv_sec + (double)end.tv_nsec / 1000000000.0;
return end_in_sec - start_in_sec;
}
static inline struct timespec get_current_time() {
struct timespec time;
clock_gettime(CLOCK_MONOTONIC, &time);
return time;
}
I have a number of time series each containing a sequence of 400 numbers that are close to each other. I have thousands of time series; each has its own series of close numbers.
TimeSeries1 = 184.56, 184.675, 184.55, 184.77, ...
TimeSeries2 = 145.73, 145.384, 145.96, 145.33, ...
TimeSeries3 = -126.48, -126.78, -126.55, ...
I can store an 8 byte double for each time Series, so for most of the time series, I can compress each double to a single byte by multiplying by 100 and taking the delta of the current value and the previous value.
Here is my compress/decompress code:
struct{
double firstValue;
double nums[400];
char compressedNums[400];
int compressionOK;
} timeSeries;
void compress(void){
timeSeries.firstValue = timeSeries.nums[0];
double lastValue = timeSeries.firstValue;
for (int i = 1; i < 400; ++i){
int delta = (int) ((timeSeries.nums[i] * 100) - (lastValue* 100));
timeSeries.compressionOK = 1;
if (delta > CHAR_MAX || delta < -CHAR_MAX){
timeSeries.compressionOK = 0;
return;
}
else{
timeSeries.compressedNums[i] = (char) delta;
lastValue = timeSeries.nums[i];
}
}
}
double decompressedNums[400];
void decompress(void){
if (timeSeries.compressionOK){
double lastValue = timeSeries.firstValue;
for (int i = 1; i < 400; ++i){
decompressedNums[i] = lastValue + timeSeries.compressedNums[i] / 100.0;
lastValue = decompressedNums[i];
}
}
}
I can tolerate some lossiness, on the order of .005 per number. However, I am getting more loss than I can tolerate, especially since a precision loss in one of the compressed series carries forward and causes an increasing amount of loss.
So my questions are:
Is there something I can change to reduce the lossiness?
Is there an altogether different compression method that has a comparable, or better, than this 8 to 1 ratio?
You can avoid the slow drift in precision by working out the delta not from the precise value of the previous element, but rather from the computed approximation of the previous element (i.e. the sum of the deltas). That way, you will always get the closest approximation to the next value.
Personally, I'd use integer arithmetic for this purpose, but it will probably be fine with floating point arithmetic too, since floating point is reproducible even if not precise.
Look at the values as stored in memory:
184. == 0x4067000000000000ull
184.56 == 0x406711eb851eb852ull
The first two bytes are the same but the last six bytes are different.
For integer deltas, multiply by 128 instead of 100, this will get you 7 bits of the fractional part. If the delta is too large for one byte use a three byte sequence {0x80, hi_delta, lo_delta}, so 0x80 is used a special indicator. If the delta happened to be -128, then that would be {0x80, 0xff, 0x80}.
You should round the values before converting to an int to avoid the problems, as in this code.
#include <limits.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
enum { TS_SIZE = 400 };
typedef struct
{
double firstValue;
double nums[TS_SIZE];
signed char compressedNums[TS_SIZE];
int compressionOK;
} timeSeries;
static
void compress(timeSeries *t1)
{
t1->firstValue = t1->nums[0];
double lastValue = t1->firstValue;
for (int i = 1; i < TS_SIZE; ++i)
{
int delta = (int) round((t1->nums[i] - lastValue) * 100.0);
t1->compressionOK = 1;
if (delta > CHAR_MAX || delta < -CHAR_MAX)
{
printf("Delta too big: %d (%.3f) vs %d (%.3f) = delta %.3f\n",
i-1, t1->nums[i-1], i, t1->nums[i], t1->nums[i] - t1->nums[i-1]);
t1->compressionOK = 0;
return;
}
else
{
t1->compressedNums[i] = (char) delta;
lastValue = t1->nums[i];
}
}
}
static
void decompress(timeSeries *t1)
{
if (t1->compressionOK)
{
double lastValue = t1->firstValue;
for (int i = 1; i < TS_SIZE; ++i)
{
t1->nums[i] = lastValue + t1->compressedNums[i] / 100.0;
lastValue = t1->nums[i];
}
}
}
static void compare(const timeSeries *t0, const timeSeries *t1)
{
for (int i = 0; i < TS_SIZE; i++)
{
char c = (fabs(t0->nums[i] - t1->nums[i]) > 0.005) ? '!' : ' ';
printf("%c %03d: %.3f vs %.3f = %+.3f\n", c, i, t0->nums[i], t1->nums[i], t0->nums[i] - t1->nums[i]);
}
}
int main(void)
{
timeSeries t1;
timeSeries t0;
int i;
for (i = 0; i < TS_SIZE; i++)
{
if (scanf("%lf", &t0.nums[i]) != 1)
break;
}
if (i != TS_SIZE)
{
printf("Reading problems\n");
return 1;
}
t1 = t0;
for (i = 0; i < 10; i++)
{
printf("Cycle %d:\n", i+1);
compress(&t1);
decompress(&t1);
compare(&t0, &t1);
}
return 0;
}
With the following data (generated from integers in the range 18456..18855 divided by 100 and randomly perturbed by a small amount (about 0.3%, to keep the values close enough together), I got the same data over, and over again, for the full 10 cycles of compression and decompression.
184.60 184.80 184.25 184.62 184.49 184.94 184.95 184.39 184.50 184.96
184.54 184.72 184.84 185.02 184.83 185.01 184.43 185.00 184.74 184.88
185.04 184.79 184.55 184.94 185.07 184.60 184.55 184.57 184.95 185.07
184.61 184.57 184.57 184.98 185.24 185.11 184.89 184.72 184.77 185.29
184.98 184.91 184.76 184.89 185.26 184.94 185.09 184.68 184.69 185.04
185.39 185.05 185.41 185.41 184.74 184.77 185.16 184.84 185.31 184.90
185.18 185.15 185.03 185.41 185.18 185.25 185.01 185.31 185.36 185.29
185.62 185.48 185.40 185.15 185.29 185.19 185.32 185.60 185.39 185.22
185.66 185.48 185.53 185.59 185.27 185.69 185.29 185.70 185.77 185.40
185.41 185.23 185.84 185.30 185.70 185.18 185.68 185.43 185.45 185.71
185.60 185.82 185.92 185.40 185.85 185.65 185.92 185.80 185.60 185.57
185.64 185.39 185.48 185.36 185.69 185.76 185.45 185.72 185.47 186.04
185.81 185.80 185.94 185.64 186.09 185.95 186.03 185.55 185.65 185.75
186.03 186.02 186.24 186.19 185.62 186.13 185.98 185.84 185.83 186.19
186.17 185.80 186.15 186.10 186.32 186.25 186.09 186.20 186.06 185.80
186.02 186.40 186.26 186.15 186.35 185.90 185.98 186.19 186.15 185.84
186.34 186.20 186.41 185.93 185.97 186.46 185.92 186.19 186.15 186.32
186.06 186.25 186.47 186.56 186.47 186.33 186.55 185.98 186.36 186.35
186.65 186.60 186.52 186.13 186.39 186.55 186.50 186.45 186.29 186.24
186.81 186.61 186.80 186.60 186.75 186.83 186.86 186.35 186.34 186.53
186.60 186.69 186.32 186.23 186.39 186.71 186.65 186.37 186.37 186.54
186.81 186.84 186.78 186.50 186.47 186.44 186.36 186.59 186.87 186.70
186.90 186.47 186.50 186.74 186.80 186.86 186.72 186.63 186.78 186.52
187.22 186.71 186.56 186.90 186.95 186.67 186.79 186.99 186.85 187.03
187.04 186.89 187.19 187.33 187.09 186.92 187.35 187.29 187.04 187.00
186.79 187.32 186.94 187.07 186.92 187.06 187.39 187.20 187.35 186.78
187.47 187.54 187.33 187.07 187.39 186.97 187.48 187.10 187.52 187.55
187.06 187.24 187.28 186.92 187.60 187.05 186.95 187.26 187.08 187.35
187.24 187.66 187.57 187.75 187.15 187.08 187.55 187.30 187.17 187.17
187.13 187.14 187.40 187.71 187.64 187.32 187.42 187.19 187.40 187.66
187.93 187.27 187.44 187.35 187.34 187.54 187.70 187.62 187.99 187.97
187.51 187.36 187.82 187.75 187.56 187.53 187.38 187.91 187.63 187.51
187.39 187.54 187.69 187.84 188.16 187.61 188.03 188.06 187.53 187.51
187.93 188.04 187.77 187.69 188.03 187.81 188.04 187.82 188.14 187.96
188.05 187.63 188.35 187.65 188.00 188.27 188.20 188.21 187.81 188.04
187.87 187.96 188.18 187.98 188.46 187.89 187.77 188.18 187.83 188.03
188.48 188.09 187.82 187.90 188.40 188.32 188.33 188.29 188.58 188.53
187.88 188.32 188.57 188.14 188.02 188.25 188.62 188.43 188.19 188.54
188.20 188.06 188.31 188.19 188.48 188.44 188.69 188.63 188.34 188.76
188.32 188.82 188.45 188.34 188.44 188.25 188.39 188.83 188.49 188.18
Until I put the rounding in, the values would rapidly drift apart.
If you don't have round() — which was added to Standard C in the C99 standard — then you can use these lines in place of round():
int delta;
if (t1->nums[i] > lastValue)
delta = (int) (((t1->nums[i] - lastValue) * 100.0) + 0.5);
else
delta = (int) (((t1->nums[i] - lastValue) * 100.0) - 0.5);
This rounds correctly for positive and negative values. You could also factor that into a function; in C99, you could make it an inline function, but if that worked, you would have the round() function in the library, too. I used this code at first before switching to the round() function.
I wrote a sample program to understand the time measurement in C.Below is a small self contained example.I have a function do_primes() that calculates prime numbers.In the main() function between timing code I call do_primes() and also sleep for 20 milliseconds.I am measure time using struct timeval (which I understand returns clock time.) and also cpu_time using CLOCKS_PER_SEC.Now as I understand it,this denotes the time for which the CPU was working.
The output of the program is as follows.
Calculated 9592 primes.
elapsed time 2.866976 sec.
cpu time used 2.840000 secs.
As you can see the differnece between the elapsed time and cpu time is
0.026976 seconds OR 26.976 milliseconds.
1) Are my assumptions correct?
2) 6.976 milliseconds is accounted for my the scheduler switch delay?
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#define MAX_PRIME 100000
void do_primes()
{
unsigned long i, num, primes = 0;
for (num = 1; num <= MAX_PRIME; ++num)
{
for (i = 2; (i <= num) && (num % i != 0); ++i);
if (i == num)
++primes;
}
printf("Calculated %ld primes.\n", primes);
}
int main()
{
struct timeval t1, t2;
double elapsedTime;
clock_t start, end;
double cpu_time_used;
int primes = 0;
int i = 0;
int num = 0;
start = clock();
/* start timer*/
gettimeofday(&t1, NULL);
/*do something */
usleep(20000);
do_primes();
/* stop timer*/
gettimeofday(&t2, NULL);
end = clock();
/*compute and print the elapsed time in millisec*/
elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0; /* sec to ms*/
elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0; /* us to ms */
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("elapsed time %f sec. \ncpu time used %f secs.\n",(elapsedTime/1000),cpu_time_used);
return 0;
}
Your understanding is correct.
The additional 6.976ms might not mean anything at all, because it's possible that the clock() function only has a resolution of 10ms.