Does anyboy have an idea how the time() function works?
I was looking online for implementations out of pure curiosity, but could only find the NetBSD implementation of difftime()
Also is there anything that describes the process of calculating the time (non system specific or system specific)?
Note: I am not looking for answers on how to use time() but how it actually works behind the scenes when I call it.
Somewhere deep down in your computer, typically in hardware, there's a clock oscillator running at some frequency f. For the purposes of this example let's say that it's operating at 1 kHz, or 1,000 cycles per second. Things are set up so that every cycle of the oscillator triggers a CPU interrupt.
There's also a low-level counter c. Every time the clock interrupt is triggered, the OS increments the counter. For the moment we'll imagine it increments it by 1, although this won't usually be the case in practice.
The OS also checks the value of the counter as it's incremented. When c equals 1,000, this means that exactly one second has gone by. At this point the OS does two things:
It increments another counter variable, the one that's keeping track of the actual time of day in seconds. We'll call this other counter t. (It's going to be a big number, so it'll be at least a 32-bit variable, or these days, 64 bits if possible.)
It resets c to 0.
Finally, when you call time(), the kernel simply returns you the current value of t. It's pretty simple, really!
Well, actually, it's somewhat more complicated than that. I've overlooked the details of how the value of the counter t gets set up initially, and how the OS makes sure that the oscillator is running at the right frequency, and a few other things.
When the OS boots, and if it's on a PC or workstation or mainframe or other "big" computer, it's typically got a battery-backed real-time clock it can use to set the initial value of t from. (If the CPU we're talking about is an embedded microcontroller, on the other hand, it may not have any kind of clock, and all of this is moot, and time() is not implemented at all.)
Also, when you (as root) call settimeofday, you're basically just supplying a value to jam into the kernel's t counter.
Also, of course, on a networked system, something like NTP is busy keeping the system's time up-to-date.
NTP can do that in two ways:
If it notices that t is way off, it can just set it to a new value, more or less as settimeofday() does.
If it notices that t is just a little bit off, or if it notices that the underlying oscillator isn't counting at quite the right frequency, it can try to adjust that frequency.
Adjusting the frequency sounds straightforward enough, but the details can get pretty complicated. You can imagine that the frequency f of the underlying oscillator is adjusted slightly. Or, you can imagine that f is left the same, but when the time interrupt fires, the numeric increment that's added to c is adjusted slightly.
In particular, it won't usually be the case that the kernel adds 1 to c on each timer interrupt, and that when c reaches 1,000, that's the indication that one second has gone by. It's more likely that the kernel will add a number like 1,000,000 to c on each timer interrupt, meaning that it will wait until c has reached 1,000,000,000 before deciding that one second has gone by. That way, the kernel can make more fine-grained adjustments to the clock rate: if things are running just a little slow, it can change its mind, and add 1,000,001 to c on each timer interrupt, and this will make things run just a tiny bit faster. (Something like one part per million, as you can pretty easily see.)
One more thing I overlooked is that time() isn't the only way of asking what the system time is. You can also make calls like gettimeofday(), which gives you a sub-second time stamp represented as seconds+microseconds (struct timeval), or clock_gettime(), which gives you a sub-second time stamp represented as seconds+nanoseconds (struct timespec). How are those implemented? Well, instead of just reading out the value of t, the kernel can also peek at c to see how far into the next second it is. In particular, if c is counting up to 1,000,000,000, then the kernel can give you microseconds by dividing c by 1,000, and it can give you nanoseconds by returning c directly.
Two footnotes:
(1) If we've adjusted the frequency, and we're adding 1,000,001 to c on each low-level timer tick, c will usually not hit 1,000,000,000 exactly, so the test when deciding whether to increment t will have to involve a greater-than-or-equal-to condition, and we'll have to subtract 1,000,000,000 from c, not just clear it. In other words, the code will look something like
if(c >= 1000000000) {
t++;
c -= 1000000000;
}
(2) Since time() and gettimeofday() are two of the simplest system calls around, and since programs calling them may (by definition) be particularly sensitive to any latency due to system call overhead, these are the calls that are most likely to be implemented based on the vDSO mechanism, if it's in use.
The C specification does not say anything about how library functions work. It only states the observable behavior. The internal workings is both compiler and platform dependent.
Synopsis
#include <time.h>
time_t time(time_t *timer);
Description
The time function determines the current calendar time. The encoding of the value is unspecified.
Returns
The time function returns the implementation's best approximation to the current calendar time. The value (time_t)(-1) is returned if the calendar time is not available. If timer is not a null pointer, the return value is also assigned to the object it points to.
https://port70.net/~nsz/c/c11/n1570.html
Here is one implementation:
time_t
time (timer)
time_t *timer;
{
__set_errno (ENOSYS);
if (timer != NULL)
*timer = (time_t) -1;
return (time_t) -1;
}
https://github.com/lattera/glibc/blob/master/time/time.c
Related
I have a Cortex M0+ chip (STM32 brand) and I want to calculate the load (or free) time. The M0+ doesn't have the DWT->SYSCNT register, so using that isn't an option.
Here's my idea:
Using a scheduler I have, I take a counter and increment it by 1 in my idle loop.
uint32_t counter = 0;
while(1){
sched_run();
}
sched_run(){
if( Jobs_timer_ready(jobs) ){
// do timed jobs
}else{
sched_idle();
}
}
sched_idle(){
counter += 1;
}
I have jobs running on a 50us timer, so I can collect the count every 100ms accurately. With a 64mhz chip, that would give me 64000000 instructions/sec or 64 instructions/usec.
If I take the number of instructions the counter uses and remove that from the total instructions per 100ms, I should have a concept of my load time (or free time). I'm slow at math, but that should be 6,400,000 instructions per 100ms. I haven't actually looked at the instructions that would take but lets be generous and say it takes 7 instructions to increment the counter, just to illustrate the process.
So, let's say the counter variable has ended up with 12,475 after 100ms. Our formula should be [CPU Free %] = Free Time/Max Time = COUNT*COUNT_INSTRUC/MAX_INSTRUC.
This comes out to 12475 * 7/6,400,000 = 87,325/6,400,00 = 0.013644 (x 100) = 1.36% Free (and this is where my math looks very wrong).
My goal is to have a mostly-accurate load percentage that can be calculated in the field. Especially if I hand it off to someone else, or need to check how it's performing. I can't always reproduce field conditions on a bench.
My basic questions are this:
How do I determine load or free?
Can I calculate load/free like a task manager (overall)?
Do I need a scheduler for it or just a timer?
I would recommend to set up a timer to count in 1 us step (or whatever resolution you need). Then just read the counter value before and after the work to get the duration.
Given your simplified program, it looks like you just have a while loop and a flag which tells you when some work needs to be done. So you could do something like this:
uint32_t busy_time = 0;
uint32_t idle_time = 0;
uint32_t idle_start = 0;
while (1) {
// Initialize the idle start timer.
idle_start = TIM2->CNT;
sched_run();
}
void sched_run()
{
if (Jobs_timer_ready(jobs)) {
// When the job starts, calculate the duration of the idle period.
idle_time += TIM2->CNT - idle_start;
// Measure the work duration.
uint32_t job_started = TIM2->CNT;
// do timed jobs
busy_time += TIM2->CNT - job_start;
// Restart idle period.
idle_start = TIM2->CNT;
}
}
The load percentage would be (busy_time / (busy_time + idle_time)) * 100.
Counting cycles isn't as easy as it seems. Reading variable from RAM, modifying it and writing it back has non-deterministic duration. RAM read is typically 2 cycles, but can be 3, depending on many things, including how congested AXIM-bus is (other MCU peripherals are also attached to it). Writing is a whole another story. There are bufferable writes, non-bufferable writes, etc etc. Also, there is caching, which changes things depending on where the executed code is, where the data it's modifying is, and cache policies for data and instructions. There is also an issue of what exactly your compiler generates. So this problem should be approached from a different angle.
I agree to #Armandas that the best solution is a hardware timer. You don't even have to set it up to a microsecond or anything (but you totally can). You can choose when to reset counter. Even if it runs at CPU clock or close to that, 32-bit overflow will take very long (but still must be handled; I would reset the timer counter when I make idle/busy calculation, seems like a reasonable moment to do that; if your program can actually overflow the timer at runtime, you need to come up with modified solution to account for it of course). Obviously, if your timer has 16-bit prescaler and counter, you will have to adjust for that. Microsecond tick seems like a reasonable compromise for your application after all.
Alternative things to consider: DTCM memory - small tightly coupled RAM - has actually strictly one cycle read/write access, it's by definition not cacheable and not bufferable. So with tightly coupled memory and tight control over what exactly instructions are being generated by compiler and executed by CPU, you can do something more deterministic with variable counter. However, if that code is ported to M7, there may be timing-related issues there because of M7's dual issue pipeline (if very simplified, it can execute 2 instructions in parallel at a time, more in Architecture Reference Manual). Just bear this in mind. It becomes a little more architecture dependent. It may or may not be an issue for you.
At the end of the day, I vote stick with hardware timer. Making it work with variable is a huge headache, and you really need to get down to architecture level to make it work properly, and even then there could always be something you forgot/didn't think about. Seems like massive overcomplication for the task at hand. Hardware timer is the boss.
I work for a company that produces automatic machines, and I help maintain their software that controls the machines. The software runs on a real-time operating system, and consists of multiple threads running concurrently. The code bases are legacy, and have substantial technical debts. Among all the issues that the code bases exhibit, one stands out as being rather bizarre to me; most of the timing algorithms that involve the computation of time elapsed to realize common timed features such as timeouts, delays, recording time spent in a particular state, and etc., basically take the following form:
unsigned int shouldContinue = 1;
unsigned int blockDuration = 1; // Let's say 1 millisecond.
unsigned int loopCount = 0;
unsigned int elapsedTime = 0;
while (shouldContinue)
{
.
. // a bunch of statements, selections and function calls
.
blockingSystemCall(blockDuration);
.
. // a bunch of statements, selections and function calls
.
loopCount++;
elapsedTime = loopCount * blockDuration;
}
The blockingSystemCall function can be any operating system's API that suspends the current thread for the specified blockDuration. The elapsedTime variable is subsequently computed by basically multiplying loopCount by blockDuration or by any equivalent algorithm.
To me, this kind of timing algorithm is wrong, and is not acceptable under most circumstances. All the instructions in the loop, including the condition of the loop, are executed sequentially, and each instruction requires measurable CPU time to execute. Therefore, the actual time elapsed is strictly greater than the value of elapsedTime in any given instance after the loop starts. Consequently, suppose the CPU time required to execute all the statements in the loop, denoted by d, is constant. Then, elapsedTime lags behind the actual time elapsed by loopCount • d for any loopCount > 0; that is, the deviation grows according to an arithmetic progression. This sets the lower bound of the deviation because, in reality, there will be additional delays caused by thread scheduling and time slicing, depending on other factors.
In fact, not too long ago, while testing a new data-driven predictive maintenance feature which relies on the operation time of a machine, we discovered that the operation time reported by the software lagged behind that of a standard reference clock by a whopping three hours after the machine was in continuous operation for just over two days. It was through this test that I discovered the algorithm outlined above, which I swiftly determined to be the root cause.
Coming from a background where I used to implement timing algorithms on bare-metal systems using timer interrupts, which allows the CPU to carry on with the execution of the business logic while the timer process runs in parallel, it was shocking for me to have discovered that the algorithm outlined in the introduction is used in the industry to compute elapsed time, even more so when a typical operating system already encapsulates the timer functions in the form of various easy-to-use public APIs, liberating the programmer from the hassle of configuring a timer via hardware registers, raising events via interrupt service routines, etc.
The kind of timing algorithm as illustrated in the skeleton code above is found in at least two code bases independently developed by two distinct software engineering teams from two subsidiary companies located in two different cities, albeit within the same state. This makes me wonder whether it is how things are normally done in the industry or it is just an isolated case and is not widespread.
So, the question is, is the algorithm shown above common or acceptable in calculating elapsed time, given that the underlying operating system already provides highly optimized time-management system calls that can be used right out of the box to accurately measure elapsed time or even used as basic building blocks for creating higher-level timing facilities that provide more intuitive methods similar to, e.g., the Timer class in C#?
You're right that calculating elapsed time that way is inaccurate -- since it assumes that the blocking call will take exactly the amount of time indicated, and that everything that happens outside of the blocking system call will take no time at all, which would only be true on an infinitely-fast machine. Since actual machines are not infinitely fast, the elapsed-time calculated this way will always be somewhat less than the actual elapsed time.
As to whether that's acceptable, it's going to depend on how much timing accuracy your program needs. If it's just doing a rough estimate to make sure a function doesn't run for "too long", this might be okay. OTOH if it is trying for accuracy (and in particular accuracy over a long period of time), then this approach won't provide that.
FWIW the more common (and more accurate) way to measure elapsed time would be something like this:
const unsigned int startTime = current_clock_time();
while (shouldContinue)
{
loopCount++;
elapsedTime = current_clock_time() - startTime;
}
This has the advantage of not "drifting away" from the accurate value over time, but it does assume that you have a current_clock_time() type of function available, and that it's acceptable to call it within the loop. (If current_clock_time() is very expensive, or doesn't provide some real-time performance guarantees that the calling routine requires, that might be a reason not to do it this way)
I don't think these loops do what you think they do.
In a RTOS, the purpose of a loop like this is usually to perform a task at regular intervals.
blockingSystemCall(N) probably does not just sleep for N milliseconds like you think it does. It probably sleeps until N milliseconds after the last time your thread woke up.
More accurately, all the sleeps your thread has performed since starting are added to the thread start time to get the time at which the OS will try to wake the thread up. If your thread woke up due to an I/O event, then the last one of those times could be used instead of the thread start time. The point is that the inaccuracies in all these start times are corrected, so your thread wakes up at regular intervals and the elapsed time measurement is perfectly accurate according to the RTOS master clock.
There could also be very good reasons for measuring elapsed time by the RTOS master clock instead of a more accurate wall clock time, in addition to simplicity. This is because all of the guarantees that an RTOS provides (which is the reason you are using a RTOS in the first place) are provided in that time scale. The amount of time taken by one task can affect the amount of time you are guaranteed to have available for other tasks, as measured by this clock.
It may or may not be a problem that your RTOS master clock runs slow by 3 hours every 2 days...
I want to implement a delay function using null loops. But the amount of time needed to complete a loop once is compiler and machine dependant. I want my program to determine the time on its own and delay the program for the specified amount of time. Can anyone give me any idea how to do this?
N. B. There is a function named delay() which suspends the system for the specified milliseconds. Is it possible to suspend the system without using this function?
First of all, you should never sit in a loop doing nothing. Not only does it waste energy (as it keeps your CPU 100% busy counting your loop counter) -- in a multitasking system it also decreases the whole system performance, because your process is getting time slices all the time as it appears to be doing something.
Next point is ... I don't know of any delay() function. This is not standard C. In fact, until C11, there was no standard at all for things like this.
POSIX to the rescue, there is usleep(3) (deprecated) and nanosleep(2). If you're on a POSIX-compliant system, you'll be fine with those. They block (means, the scheduler of your OS knows they have nothing to do and schedules them only after the end of the call), so you don't waste CPU power.
If you're on windows, for a direct delay in code, you only have Sleep(). Note that THIS function takes milliseconds, but has normally only a precision around 15ms. Often good enough, but not always. If you need better precision on windows, you can request more timer interrupts using timeBeginPeriod() ... timeBeginPeriod(1); will request a timer interrupt each millisecond. Don't forget calling timeEndPeriod() with the same value as soon as you don't need the precision any more, because more timer interrupts come with a cost: they keep the system busy, thus wasting more energy.
I had a somewhat similar problem developing a little game recently, I needed constant ticks in 10ms intervals, this is what I came up with for POSIX-compliant systems and for windows. The ticker_wait() function in this code just suspends until the next tick, maybe this is helpful if your original intent was some timing issue.
Unless you're on a real-time operating system, anything you program yourself directly is not going to be accurate. You need to use a system function to sleep for some amount of time like usleep in Linux or Sleep in Windows.
Because the operating system could interrupt the process sooner or later than the exact time expected, you should get the system time before and after you sleep to determine how long you actually slept for.
Edit:
On Linux, you can get the current system time with gettimeofday, which has microsecond resolution (whether the actual clock is that accurate is a different story). On Windows, you can do something similar with GetSystemTimeAsFileTime:
int gettimeofday(struct timeval *tv, struct timezone *tz)
{
const unsigned __int64 epoch_diff = 11644473600000000;
unsigned __int64 tmp;
FILETIME t;
if (tv) {
GetSystemTimeAsFileTime(&t);
tmp = 0;
tmp |= t.dwHighDateTime;
tmp <<= 32;
tmp |= t.dwLowDateTime;
tmp /= 10;
tmp -= epoch_diff;
tv->tv_sec = (long)(tmp / 1000000);
tv->tv_usec = (long)(tmp % 1000000);
}
return 0;
}
You could do something like find the exact time it is at a point in time and then keep it in a while loop which rechecks the time until it gets to whatever the time you want. Then it just breaks out and continue executing the rest of your program. I'm not sure if I see much of a benefit in looping rather than just using the delay function though.
Come someone please tell me how this function works? I'm using it in code and have an idea how it works, but I'm not 100% sure exactly. I understand the concept of an input variable N incrementing down, but how the heck does it work? Also, if I am using it repeatedly in my main() for different delays (different iputs for N), then do I have to "zero" the function if I used it somewhere else?
Reference: MILLISEC is a constant defined by Fcy/10000, or system clock/10000.
Thanks in advance.
// DelayNmSec() gives a 1mS to 65.5 Seconds delay
/* Note that FCY is used in the computation. Please make the necessary
Changes(PLLx4 or PLLx8 etc) to compute the right FCY as in the define
statement above. */
void DelayNmSec(unsigned int N)
{
unsigned int j;
while(N--)
for(j=0;j < MILLISEC;j++);
}
This is referred to as busy waiting, a concept that just burns some CPU cycles thus "waiting" by keeping the CPU "busy" doing empty loops. You don't need to reset the function, it will do the same if called repeatedly.
If you call it with N=3, it will repeat the while loop 3 times, every time counting with j from 0 to MILLISEC, which is supposedly a constant that depends on the CPU clock.
The original author of the code have timed and looked at the assembler generated to get the exact number of instructions executed per Millisecond, and have configured a constant MILLISEC to match that for the for loop as a busy-wait.
The input parameter N is then simply the number of milliseconds the caller want to wait and the number of times the for-loop is executed.
The code will break if
used on a different or faster micro controller (depending on how Fcy is maintained), or
the optimization level on the C compiler is changed, or
c-compiler version is changed (as it may generate different code)
so, if the guy who wrote it is clever, there may be a calibration program which defines and configures the MILLISEC constant.
This is what is known as a busy wait in which the time taken for a particular computation is used as a counter to cause a delay.
This approach does have problems in that on different processors with different speeds, the computation needs to be adjusted. Old games used this approach and I remember a simulation using this busy wait approach that targeted an old 8086 type of processor to cause an animation to move smoothly. When the game was used on a Pentium processor PC, instead of the rocket majestically rising up the screen over several seconds, the entire animation flashed before your eyes so fast that it was difficult to see what the animation was.
This sort of busy wait means that in the thread running, the thread is sitting in a computation loop counting down for the number of milliseconds. The result is that the thread does not do anything else other than counting down.
If the operating system is not a preemptive multi-tasking OS, then nothing else will run until the count down completes which may cause problems in other threads and tasks.
If the operating system is preemptive multi-tasking the resulting delays will have a variability as control is switched to some other thread for some period of time before switching back.
This approach is normally used for small pieces of software on dedicated processors where a computation has a known amount of time and where having the processor dedicated to the countdown does not impact other parts of the software. An example might be a small sensor that performs a reading to collect a data sample then does this kind of busy loop before doing the next read to collect the next data sample.
I know QueryPerformanceCounter() can be used for timing functions. I want to know:
1-Can I increase the resolution of the timer by over-clocking the CPU (so it ticks faster)?
2-Basically what makes some timers more precise than others, (e.g, QueryPerformanceCounter() is more precise as compared to GetTickCount())? If there is single crystal oscillator on the motherboard , why some timers are slower as compared to others?
QueryPerformanceCounter has very high resolution - normally less than one nanosecond. I don't see why you'd like to increase it. Overclocking will increase it, but it seems like a very weak reason for overclocking.
QueryPerformanceCounter is very accurate, but somewhat expensive and not very convenient.
a. It's expensive because it uses the expensive rdtsc instruction. Faster timers can just read an integer from memory. This integer needs to be updated, and we don't want to do it too often (1000 times a second is reasonable), so we get a very cheap timer, with low precision. That's basically GetTickCount.
b. It's inconvenient because it uses units which change between computers. Sometimes it will be nanoseconds, sometimes half-nano, or other values. It makes it harder to calculate with.
a. Another source of inconvenience is that it returns very large numbers, which may overflow when you try to do math with them, so you need to be careful.
The timing source for QPC is machine dependent. It is typically picked up from a frequency available somewhere in the chipset. Whether overclocking the cpu is going to affect it is highly dependent on your motherboard design. The simplest way is to just try it, use QueryPerformanceFrequency to see the effect.
GetTickCount is driven from an entirely different timer source, the signal that also generates the clock interrupt. It is not very precise, 1/64 of second normally, but it is highly accurate. Your machine contacts a time server from time to time to recalibrate the clock and adjust the clock correction factor. Which makes it accurate to about a second over an entire year. QPC is very precise, but not nearly as accurate. Use it only to time short intervals.
1 - Yes, Internally, one of the better timers is rdtsc, which does give you the clock value. Combining this with information from cpuid instruction, gives you time.
2 - The other timers rely upon various timing sources, such as the 8253 timer, for instance.
QPF is a wrapper added by Microsoft on and over what rdtsc provides. Read this article for more info:
http://www.strchr.com/performance_measurements_with_rdtsc