Measuring time lapses in kernel 2.4.37 - c

My purpose is quite simple: to measure the time elapsed:
unsigned long start, end;
int init_module (void) {
start = jiffies;
printk("Hello Modules\n");
end = jiffies;
printk("Measuring time lapses: %lu\n", (end - start) * 1000 /HZ);
return 0;
}
But this method does not work because printk is too short, can any one give me some suggestions? Any other options? Both C and assembly are ok. And I'm required to work under kernel 2.4.37.

Check out printk times. This shall enable you to see the time between the calls or show time relative to a particular message. This is one of the instrumentation tool for kernel.
Provided below the related links and link for patch:
http://elinux.org/Printk_Times
http://elinux.org/images/d/d0/Instrumented_printk.patch

Related

How to implement a timer for every second with zero nanosecond with liburing?

I noticed the io_uring kernel side uses CLOCK_MONOTONIC at CLOCK_MONOTONIC, so for the first timer, I get the time with both CLOCK_REALTIME and CLOCK_MONOTONIC and adjust the nanosecond like below and use IORING_TIMEOUT_ABS flag for io_uring_prep_timeout. iorn/clock.c at master · hnakamur/iorn
const long sec_in_nsec = 1000000000;
static int queue_timeout(iorn_queue_t *queue) {
iorn_timeout_op_t *op = calloc(1, sizeof(*op));
if (op == NULL) {
return -ENOMEM;
}
struct timespec rts;
int ret = clock_gettime(CLOCK_REALTIME, &rts);
if (ret < 0) {
fprintf(stderr, "clock_gettime CLOCK_REALTIME error: %s\n", strerror(errno));
return -errno;
}
long nsec_diff = sec_in_nsec - rts.tv_nsec;
ret = clock_gettime(CLOCK_MONOTONIC, &op->ts);
if (ret < 0) {
fprintf(stderr, "clock_gettime CLOCK_MONOTONIC error: %s\n", strerror(errno));
return -errno;
}
op->handler = on_timeout;
op->ts.tv_sec++;
op->ts.tv_nsec += nsec_diff;
if (op->ts.tv_nsec > sec_in_nsec) {
op->ts.tv_sec++;
op->ts.tv_nsec -= sec_in_nsec;
}
op->count = 1;
op->flags = IORING_TIMEOUT_ABS;
ret = iorn_prep_timeout(queue, op);
if (ret < 0) {
return ret;
}
return iorn_submit(queue);
}
From the second time, I just increment the second part tv_sec and use IORING_TIMEOUT_ABS flag for io_uring_prep_timeout.
Here is the output from my example program. The millisecond part is zero but it is about 400 microsecond later than just second.
on_timeout time=2020-05-10T14:49:42.000442
on_timeout time=2020-05-10T14:49:43.000371
on_timeout time=2020-05-10T14:49:44.000368
on_timeout time=2020-05-10T14:49:45.000372
on_timeout time=2020-05-10T14:49:46.000372
on_timeout time=2020-05-10T14:49:47.000373
on_timeout time=2020-05-10T14:49:48.000373
Could you tell me a better way than this?
Thanks for your comments! I'd like to update the current time for logging like ngx_time_update(). I modified my example to use just CLOCK_REALTIME, but still about 400 microseconds late. github.com/hnakamur/iorn/commit/… Does it mean clock_gettime takes about 400 nanoseconds on my machine?
Yes, that sounds about right, sort of. But, if you're on an x86 PC under linux, 400 ns for clock_gettime overhead may be a bit high (order of magnitude higher--see below). If you're on an arm CPU (e.g. Raspberry Pi, nvidia Jetson), it might be okay.
I don't know how you're getting 400 microseconds. But, I've had to do a lot of realtime stuff under linux, and 400 us is similar to what I've measured as the overhead to do a context switch and/or wakeup a process/thread after a syscall suspends it.
I never use gettimeofday anymore. I now just use clock_gettime(CLOCK_REALTIME,...) because it's the same except you get nanoseconds instead of microseconds.
Just so you know, although clock_gettime is a syscall, nowadays, on most systems, it uses the VDSO layer. The kernel injects special code into the userspace app, so that it is able to access the time directly without the overhead of a syscall.
If you're interested, you could run under gdb and disassemble the code to see that it just accesses some special memory locations instead of doing a syscall.
I don't think you need to worry about this too much. Just use clock_gettime(CLOCK_MONOTONIC,...) and set flags to 0. The overhead doesn't factor into this, for the purposes of the ioring call as your iorn layer is using it.
When I do this sort of thing, and I want/need to calculate the overhead of clock_gettime itself, I call clock_gettime in a loop (e.g. 1000 times), and try to keep the total time below a [possible] timeslice. I use the minimum diff between times in each iteration. That compensates for any [possible] timeslicing.
The minimum is the overhead of the call itself [on average].
There are additional tricks that you can do to minimize latency in userspace (e.g. raising process priority, clamping CPU affinity and I/O interrupt affinity), but they can involve a few more things, and, if you're not very careful, they can produce worse results.
Before you start taking extraordinary measures, you should have a solid methodology to measure timing/benchmarking to prove that your results can not meet your timing/throughput/latency requirements. Otherwise, you're doing complicated things for no real/measurable/necessary benefit.
Below is some code I just created, simplified, but based on code I already have/use to calibrate the overhead:
#include <stdio.h>
#include <time.h>
#define ITERMAX 10000
typedef long long tsc_t;
// tscget -- get time in nanoseconds
static inline tsc_t
tscget(void)
{
struct timespec ts;
tsc_t tsc;
clock_gettime(CLOCK_MONOTONIC,&ts);
tsc = ts.tv_sec;
tsc *= 1000000000;
tsc += ts.tv_nsec;
return tsc;
}
// tscsec -- convert nanoseconds to fractional seconds
double
tscsec(tsc_t tsc)
{
double sec;
sec = tsc;
sec /= 1e9;
return sec;
}
tsc_t
calibrate(void)
{
tsc_t tscbeg;
tsc_t tscold;
tsc_t tscnow;
tsc_t tscdif;
tsc_t tscmin;
int iter;
tscmin = 1LL << 62;
tscbeg = tscget();
tscold = tscbeg;
for (iter = ITERMAX; iter > 0; --iter) {
tscnow = tscget();
tscdif = tscnow - tscold;
if (tscdif < tscmin)
tscmin = tscdif;
tscold = tscnow;
}
tscdif = tscnow - tscbeg;
printf("MIN:%.9f TOT:%.9f AVG:%.9f\n",
tscsec(tscmin),tscsec(tscdif),tscsec(tscnow - tscbeg) / ITERMAX);
return tscmin;
}
int
main(void)
{
calibrate();
return 0;
}
On my system, a 2.67GHz Core i7, the output is:
MIN:0.000000019 TOT:0.000254999 AVG:0.000000025
So, I'm getting 25 ns overhead [and not 400 ns]. But, again, each system can be different to some extent.
UPDATE:
Note that x86 processors have "speed step". The OS can adjust the CPU frequency up or down semi-automatically. Lower speeds conserve power. Higher speeds are maximum performance.
This is done with a heuristic (e.g. if the OS detects that the process is a heavy CPU user, it will up the speed).
To force maximum speed, linux has this directory:
/sys/devices/system/cpu/cpuN/cpufreq
Where N is the cpu number (e.g. 0-7)
Under this directory, there are a number of files of interest. They should be self explanatory.
In particular, look at scaling_governor. It has either ondemand [kernel will adjust as needed] or performance [kernel will force maximum CPU speed].
To force maximum speed, as root, set this [once] to performance (e.g.):
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Do this for all cpus.
However, I just did this on my system, and it had little effect. So, the kernel's heuristic may have improved.
As to the 400us, when a process has been waiting on something, when it is "woken up", this is a two step process.
The process is marked "runnable".
At some point, the system/CPU does a reschedule. The process will be run, based upon the scheduling policy and the process priority in effect.
For many syscalls, the reschedule [only] occurs on the next system timer/clock tick/interrupt. So, for some, there can be a delay of up to a full clock tick (i.e.) for HZ value of 1000, this can be up to 1ms (1000 us) later.
On average, this is one half of HZ or 500 us.
For some syscalls, when the process is marked runnable, a reschedule is done immediately. If the process has a higher priority, it will be run immediately.
When I first looked at this [circa 2004], I looked at all code paths in the kernel, and the only syscall that did the immediate reschedule was SysV IPC, for msgsnd/msgrcv. That is, when process A did msgsnd, any process B waiting for the given message would be run.
But, others did not (e.g. futex). They would wait for the timer tick. A lot has changed since then, and now, more syscalls will do the immediate reschedule. For example, I recently measured futex [invoked via pthread_mutex_*], and it seemed to do the quick reschedule.
Also, the kernel scheduler has changed. The newer scheduler can wakeup/run some things on a fraction of a clock tick.
So, for you, the 400 us, is [possibly] the alignment to the next clock tick.
But, it could just be the overhead of doing the syscall. To test that, I modified my test program to open /dev/null [and/or /dev/zero], and added read(fd,buf,1) to the test loop.
I got a MIN: value of 529 us. So, the delay you're getting could just be the amount of time it takes to do the task switch.
This is what I would call "good enough for now".
To get "razor's edge" response, you'd probably have to write a custom kernel driver and have the driver do this. This is what embedded systems would do if (e.g.) they had to toggle a GPIO pin on every interval.
But, if all you're doing is printf, the overhead of printf and the underlying write(1,...) tends to swamp the actual delay.
Also, note that when you do printf, it builds the output buffer and when the buffer in FILE *stdout is full, it flushes via write.
For best performance, it's better to do int len = sprintf(buf,"current time is ..."); write(1,buf,len);
Also, when you do this, if the kernel buffers for TTY I/O get filled [which is quite possible given the high frequency of messages you're doing], the process will be suspended until the I/O has been sent to the TTY device.
To do this well, you'd have to watch how much space is available, and skip some messages if there isn't enough space to [wholy] contain them.
You'd need to do: ioctl(1,TIOCOUTQ,...) to get the available space and skip some messages if it is less than the size of the message you want to output (e.g. the len value above).
For your usage, you're probably more interested in the latest time message, rather than outputting all messages [which would eventually produce a lag]

Run code for exactly one second

I would like to know how I can program something so that my program runs as long as a second lasts.
I would like to evaluate parts of my code and see where the time is spend most so I am analyzing parts of it.
Here's the interesting part of my code :
int size = 256
clock_t start_benching = clock();
for (uint32_t i = 0;i < size; i+=4)
{
myarray[i];
myarray[i+1];
myarray[i+2];
myarray[i+3];
}
clock_t stop_benching = clock();
This just gives me how long the function needed to perform all the operations.
I want to run the code for one second and see how many operations have been done.
This is the line to print the time measurement:
printf("Walking through buffer took %f seconds\n", (double)(stop_benching - start_benching) / CLOCKS_PER_SEC);
A better approach to benchmarking is to know the % of time spent on each section of the code.
Instead of making your code run for exactly 1 second, make stop_benchmarking - start_benchmarking the total run time - Take the time spent on any part of the code and divide by the total runtime to get a value between 0 and 1. Multiply this value by 100 and you have the % of time consumed at that specific section.
Non-answer advice: Use an actual profiler to profile the performance of code sections.
On *nix you can set an alarm(2) with a signal handler that sets a global flag to indicate the elapsed time. The Windows API provides something similar with SetTimer.
#include <unistd.h>
#include <signal.h>
int time_elapsed = 0;
void alarm_handler(int signal) {
time_elapsed = 1;
}
int main() {
signal(SIGALRM, &alarm_handler);
alarm(1); // set alarm time-out to 1 second
do {
// stuff...
} while (!time_elapsed);
return 0;
}
In more complicated cases you can use setitimer(2) instead of alarm(2), which lets you
use microsecond precision and
choose between counting
wall clock time,
user CPU time, or
user and system CPU time.

Where is the kernel time stored in memory in Linux?

For an assignment I have to use a video driver and system timer handler to display the current running time of the Linux system to the corner of the screen.
However, I have not found anywhere that points me into the direction of obtaining the system time from the kernel when my program runs. I am guessing it is in kernel memory at some address and I can just do something like:
hour = get_word(MEM_LOCATION_OF_HOUR);
sec = get_word(MEM_LOCATION_OF_SEC);
ect...
But I cannot find out if this is possible. My guess is that we are not allowed to use library calls like clock() but if that is the only possible way then maybe we are.
Thanks
Can't use library calls? - that's just craziness. Anyway:
getnstimeofday(struct timespec *ts); is one of many methods from here
In kernel, ktime can be used.
a simple example(for calculating time diff) for your reference.
#include <linux/ktime.h>
int fun()
{
ktime_t entry_stamp, now;
s64 delta;
/* Get the current time .*/
entry_stamp = ktime_get();
/* Do your Stuff... */
now = ktime_get();
delta = ktime_to_ns(ktime_sub(now, entry_stamp));
printk(KERN_INFO "Time Taken:%lld ns to execute\n",(long long)delta);
return 0;
}
I have found that the Real-time clock holds the correct values on boot. The CMOS contains all the needed info.
Here is a link to what I found. http://wiki.osdev.org/CMOS#The_Real-Time_Clock

an efficient way to detect when the system's hour changed from xx:59 to xy:00

I have an application on Linux that needs to change some parameters each hour, e.g. at 11:00, 12:00, etc. and the system's date can be changed by the user anytime.
Is there any signal, posix function that would provides me when a hour changes from xx:59 to xx+1:00?
Normally, I use localtime(3) to fetch the current time each seconds then compare if the minute part is equal to 0. however, it does not look a good way to do it, in order to detect a change, I need to call the same function each second for an hour. Especially I run the code on an embedded board that would be good to use less resources.
Here is an example code how I do it:
static char *fetch_time() { // I use this fcn for some other purpose to fetch the time info
char *p;
time_t rawtime;
struct tm * timeinfo;
char buffer[13];
time(&rawtime);
timeinfo = localtime(&rawtime);
strftime (buffer,13,"%04Y%02m%02d%02k%02M",timeinfo);
p = (char *)malloc(sizeof(buffer));
strcpy(p, buffer);
return p;
}
static int hour_change_check(){
char *p;
p = fetch_time();
char current_minute[3] = {'\0'};
current_minute[0] = p[10];
current_minute[1] = p[11];
int current_minute_as_int = atoi(current_minute);
if (current_minute_as_int == 0){
printf("current_min: %d\n",current_minute_as_int);
free(p);
return 1;
}
free(p);
return 0;
}
int main(void){
while(1){
int x = hour_change_check();
printf("x:%d\n",x);
sleep(1);
}
return 0;
}
There is no such signal, but traditionally the method of waiting until some target time is to compute how long it is between "now" and "then", and then call sleep():
now = time(NULL);
when = (some calculation);
if (when > now)
sleep(when - now);
If you need to be very precise about the transition from, e.g., 3:59:59 to 4:00:00, you may want to sleep for a slightly shorter time in case of time adjustments due to leap seconds. (If you are running in a portable device in which time zones can change, you also need to worry about picking up the new location, and if it runs on a half-hour offset, redo all computations. There's even Solar Time in Saudi Arabia....)
Edit: per the suggestion from R.., if clock_nanosleep() is available, calculate a timespec value for the absolute wakeup time and call it with the TIMER_ABSTIME flag. See http://pubs.opengroup.org/onlinepubs/009695399/functions/clock_nanosleep.html for the definition for clock_nanosleep(). However, if time is allowed to step backwards (e.g., localtime with zone shifts), you may still have to do some maintenance checking.
Have you actually measured the overhead used in your solution of polling the time once per second (or even two given some of your other comments)?
The number of instructions that are invoked is minimal AND you do not have any looping. So at worse maybe the cpu uses 100 micro-seconds (0.1 ms, or 0.0001 s) time. This estimate is very dependent on the processor used in your embedded system and its clock speed, but the idea is that maybe the polling logic uses 1/1000 of the total time available.
Also, you could optimize your hour_change_check code to do all of the time calcs and not call another function that issues malloc which has to be immediately freed! Also, if this is an embedded *nix system, can you still run this polling logic in its own thread so that when it issues sleep() it will not interfere or delay other units of work.
Hence, measure the problem and see if it is a significant problem. The polling's performance must be balanced against the requirement that when a user changes the time then the hour change MUST be detected. That is, I think polling every second will catch the hour rollover even if the user changes the time, but is the overhead worth it. Well, how much, exactly, overhead is there?

C GetTickCount (windows function) to Time (nanoseconds)

I'm testing one code provided from my colleague and I need to measure the time of execution of one routine than performs a context switch (of threads).
What's the best choice to measure the time? I know that is available High Resolution Timers like,
QueryPerformanceCounter
QueryPerformanceFrequency
but how can I translate using that timers to miliseconds or nanoseconds?
LARGE_INTEGER lFreq, lStart;
LARGE_INTEGER lEnd;
double d;
QueryPerformanceFrequency(&lFreq);
QueryPerformanceCounter(&lStart);
/* do something ... */
QueryPerformanceCounter(&lEnd);
d = ((doublel)End.QuadPart - (doublel)lStart.QuadPart) / (doublel)lFreq.QuadPart;
d is time interval in seconds.
As the operation than i am executing is in order of 500 nanos, and the timers doens't have precision, what i made was,
i saved actual time with GetTickCount() - (Uses precision of ~ 12milis) and performs the execution of a route N_TIMES (Number of times than routine executed) than remains until i press something on console.
Calculate the time again, and make the difference dividing by N_TIMES, something like that:
int static counter;
void routine()
{
// Operations here..
counter++;
}
int main(){
start <- GetTickCount();
handle <- createThread(....., routine,...);
resumeThread(handle);
getchar();
WaitForSingleObject(handle, INFINITE);
Elapsed = (GetTickCount() - start) * 1000000.0) / counter;
printf("Nanos: %d", elapsed);
}
:)

Resources