How to implement time counting in a new operating system?

How to implement time counting in a new operating system? - c

i have to implement in a operating system, the function sleep().
Which is at the moment, not exisiting in the previous mentioned system.
The problem is, I have to count the elapsed time to wake the sleeping thread up.
How should i relize this? Do i have to count the CPU Ticks or is there another way?
Are CPU Ticks not dependend on the CPU frequency which is different for each CPU?
I have to implement the function in the language C.
the time function doesn't exist either
Thank you in advance!

Typically, such functionality is provided by a hardware timer interrupt, (and its associated driver), that manages a 'tick count' and a delta-queue of 'Thread Control Block' pointers, (pTCB). The pTCP's for sleeping threads are stored in the queue ordered by interval expiry tick count. The timer interrupt increments the tick count and checks it agains the expiry count of the item at the head of the queue.
When a thread requests a sleep, the thread pTCB is taken out of the set of ready threads, the expiry-count calculated and the pTCB inserted into the timer queue. When the pTCB reaches the end of the queue, and it's expiry tick has arrived, it is popped and added back to the set of ready threads so that it may be set running.

It totally depends on your platform/OS. It has to provide you some time-like information, e.g. ticks. Otherwise it is just impossible.
Converting ticks to seconds of course requires additional information. Again, this can be supplied by your platform. Or you have to find it out by other means (manual, configure it yourself, ...).

The easiest and most common way to do that in operating systems is to set up a timer interrupt at a static frequency, then build a timer framework on top of that, then use that timer framework to fire off wakeups for your sleeping threads.
A good paper that discusses various data structures for how to do it efficiently is here. I recommend from my own experience scheme 7. It's quite easy to implement and performs wonderfully.
You can find a fast implementation with a good API here. But I'm biased, because I wrote it.
If you don't want a timer interrupt with a static frequency it becomes much harder to implement a nice timer facility with good performance. I've done a few experiments, but I'd recommend you to start with simple timer interrupt with a static frequency. Once you start doing dynamic timers you need to exactly understand the tradeoffs you are prepared to make.

you can use time() :
time_t t = time();
while(time() < t + sleepDuration);

You may use the CPUs Time Stamp Counter (TSC) to get counter values for time keeping. See Chapter 16.12.1 of "Intel® 64 and IA-32 Architectures Software Developer’s Manual".
The TSC is a low level counter which may provide counter values independent of CPU speed:
"The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8].
The invariant TSC will run at a constant rate in all ACPI P-, C--, and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."
However, for the implementation of sleep() alike functionality you should look into timer
hardware like HPET, ACPI, and alike. See "Intel 64® and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2" and "IA-PC HPET (High Precision Event Timers) Specification" for details.

Related

Timing/Clocks in the Linux Kernel

I am writing a device driver and want to benchmark a few pieces of code to get a feel for where I could be experiencing some bottlenecks. As a result, I want to time a few segments of code.
In userspace, I'm used to using clock_gettime() with CLOCK_MONOTONIC. Looking at the kernel sources (note that I am running kernel 4.4, but will be upgrading eventually), it appears I have a few choices:
getnstimeofday()
getrawmonotonic()
get_monotonic_coarse()
getboottime()
For convenience, I have written a function (see below) to get me the current time. I am currently using getrawmonotonic() because I figured this is what I wanted. My function returns the current time as a ktime_t, so then I can use ktime_sub() to get the elapsed time between two times.
static ktime_t get_time_now(void) {
struct timespec time_now;
getrawmonotonic(&time_now);
return timespec_to_ktime(time_now);
}
Given the available high resolution clocking functions (jiffies won't work for me), what is the best function for my given application? More generally, I'm interested in any/all documentation about these functions and the underlying clocks. Primarily, I am curious if the clocks are affected by any timing adjustments and what their epochs are.

Are you comparing measurements you're making in the kernel directly with measurements you've made in userspace? I'm wondering about your choice to use CLOCK_MONOTONIC_RAW as the timebase in the kernel, since you chose to use CLOCK_MONOTONIC in userspace. If you're looking for an analogous and non-coarse function in the kernel which returns CLOCK_MONOTONIC (and not CLOCK_MONOTONIC_RAW) time, look at ktime_get_ts().
It's possible you could also be using raw kernel ticks to be measuring what you're trying to measure (rather than jiffies, which represent multiple kernel ticks), but I do not know how to do that off the top of my head.
In general if you're trying to find documentation about Linux timekeeping, you can take a look at Documentation/timers/timekeeping.txt. Usually when I try to figure out kernel timekeeping I also unfortunately just spend a lot of time reading through the kernel source in time/ (time/timekeeping.c is where most of the functions you're thinking of using right now live... it's not super well-commented, but you can probably wrap your head around it with a little bit of time). And if you're feeling altruistic after learning, remember that updating documentation is a good way to contribute to the kernel :)
To your question at the end about how clocks are affected by timing adjustments and what epochs are used:
CLOCK_REALTIME always starts at Jan 01, 1970 at midnight (colloquially known as the Unix Epoch) if there are no RTC's present or if it hasn't already been set by an application in userspace (or I guess a kernel module if you want to be weird). Usually the userspace application which sets this is the ntp daemon, ntpd or chrony or similar. Its value represents the number of seconds passed since 1970.
CLOCK_MONTONIC represents the number of seconds passed since the device was booted up, and if the device is suspended at a CLOCK_MONOTONIC value of x, when it's resumed, it resumes with CLOCK_MONOTONIC set to x as well. It's not supported on ancient kernels.
CLOCK_BOOTTIME is like CLOCK_MONOTONIC, but has time added to it across suspend/resume -- so if you suspend at a CLOCK_BOOTTIME value of x, for 5 seconds, you'll come back with a CLOCK_BOOTTIME value of x+5. It's not supported on old kernels (its support came about after CLOCK_MONOTONIC).
Fully-fleshed NTP daemons (not SNTP daemons -- that's a more lightweight and less accuracy-creating protocol) set the system clock, or CLOCK_REALTIME, using settimeofday() for large adjustments ("steps" or "jumps") -- these immediately affect the total value of CLOCK_REALTIME, and using adjtime() for smaller adjustments ("slewing" or "skewing") -- these affect the rate at which CLOCK_REALTIME moves forward per CPU clock cycle. I think for some architectures you can actually tune the CPU clock cycle through some means or other, and the kernel implements adjtime() this way if possible, but don't quote me on that. From both the bulk of the kernel's perspective and userspace's perspective, it doesn't actually matter.
CLOCK_MONOTONIC, CLOCK_BOOTTIME, and all other friends slew at the same rate as CLOCK_REALTIME, which is actually fairly convenient in most situations. They're not affected by steps in CLOCK_REALTIME, only by slews.
CLOCK_MONOTONIC_RAW, CLOCK_BOOTTIME_RAW, and friends do NOT slew at the same rate as CLOCK_REALTIME, CLOCK_MONOTONIC, and CLOCK_BOOTIME. I guess this is useful sometimes.
Linux provides some process/thread-specific clocks to userspace (CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID), which I know nothing about. I do not know if they're easily accessible in the kernel.

Why not put task context in interrupt

Here is the story.
Its a safety critical project and needs to run a time critical functional routine in 20KHz. Now the design is to put functional routine in a 20KHz FIQ interrupt, meanwhile safety interrupt also in FIQ. Thats the only two FIQ in system. (Surely there are couples of IRQ enabled in the MCU)
I know that its not good to put task context in interrupt ISR, the proper way of doing this to set mark and run in OS task. But seems current design harm nobody.
The routine takes about 10us (main clock 300MHz), so basically it will not blocks IRQ/FIQ for unacceptable time. It even save time for extra context switch compare with using OS task to run the functional routine. To me, currently it feels like the design is against every principle written on text book in university but can not find a reason to say no to it.
How could I convince myself to move functional routine from ISR to OS? Should I?

Let's recollect your situation:
you are coding a safety critical system
the software architecture isn't specified otherwise you wouldn't ask the question at hand
the system requirements weren't processed correctly otherwise 2) wouldn't be in question
someone told you to "use minimum interrupt if possible in safety critical system"
you want to use the highest priority & non-interruptible code for "just some math work"
Sorry for being a bit harsh but I wouldn't want to use/be in your safety critical system.
For your actual problem:
you have to make sure two things
the code in the FIQ must be deterministic and WCET tested
the registers of the timer must be protected and supervised. Why? An unwanted/erroneous manipulation of the timers registers by a lower safety level code can congest the CPU so much that effectively nothing else but the interrupt is processed.
All this under the assumption that your safe state depends entirely on an external hardware watchdog.
PS: Which are the hazards for users of your system? Annoyance? Injury? Lethal? Are you in a SIL or ASIL context?

The reason to move complex code away from ISR is precisely to avoid lengthy processing in the ISR and thus timing jitter and delayed interrupt servicing resulting from it.
You are stating the your processing is not lengthy so do it in the ISR! Otherwise you are just adding bloat.

20Khz = 50us between interrupts, with 10us of processing time it gives you roughly 20% of CPU time just for this "task", and a jitter of 10us in any other routine that runs in your CPU, it will also sum 10us of processing time for each 40us that any other task will consum, if it is ok for your project, and you keep your total CPU processing time below 70% (which is the common maximum acceptable for critical systems), IMHO it should work without any issue.

Embedded Programming, Wait for 12.5 us

I'm programming on the C2000 F28069 Experimenters Kit. I'm toggling a GPIO output every 12.5 microseconds 5 times in a row. I decided I don't want to use interrupts (though I will if I absolutely have to). I want to just wait that amount of times in terms of clock cycles.
My clock is running at 80MHz, so 12.5 us should be 1000 clock cycles. When I use a loop:
for(i=0;i<1000;i++)
I get a result that is way too long (not 12.5 us). What other techniques can I use?
Is sleep(n); something that I can use on a microcontroller? If so, which header file do I need to download and where can I find it? Also, now that I think about it, sleep(n); takes an int input, so that wouldn't even work... any other ideas?

Summary: Use the PWM or Timer peripherals to generate output pulses.
First, the clock speed of the CPU has a complex relationship to actual code execution speed, and in many CPUs there is more than one clock rate involved in different stages of the execution. The chip you reference has several internal clock sources, for instance. Further, each individual instruction will likely take a different number of clocks to execute, and some cores can execute part of (or all of) several instructions simultaneously.
To rigorously create a loop that required 12.5 µs to execute without using a timing interrupt or other hardware device would require careful hand coding in assembly language along with careful accounting of the execution time of each instruction.
But you are writing in C, not assembler.
So the first question you have to ask is what machine code was actually generated for your loop. And the second question is did you enable the optimizer, and to what level.
As written, a decent optimizer will determine that the loop for (i=0; i<1000; i++) ; has no visible side effects, and therefore is just a slow way of writing ;, and can be completely removed.
If it does compile the loop, it could be written naively using perhaps as many as 5 instructions, or as few as one or two. I am not personally familiar with this particular TI CPU architecture, so I won't attempt to guess at the best possible implementation.
All that said, learning about the CPU architecture and its efficiency is important to building reliable and efficient embedded systems. But given that the chip has peripheral devices built-in that provide hardware support for PWM (pulse width modulated) outputs as well as general purpose hardware timer/counters you would be far better off learning to use the hardware to generate the waveform for you.
I would start by collecting every document available on the CPU core and its peripherals, especially app notes and sample code.
The C compiler will have an option to emit and preserve an assembly language source file. I would use that as a guide to study the structure of the code generated for critical loops and other bottlenecks, as well as the effects of the compiler's various optimization levels.
The tool suite should have a mechanism for profiling your running code. Before embarking on heroic measures in pursuit of optimizations, use that first to identify the actual bottlenecks. Even if it lacks decent profiling, you are likely to have spare GPIO pins that can be toggled around critical sections of code and measured with a logic analyzer or oscilloscope.

The chip you refer has PWM (pulse width modulation) hardware declared as one of major winning features. You should rely on this. Please refer to appropriate application guide. Generally you cannot guarantee 12.5uS periods from application layer (and should not try to do so). Even if you managed to do so directly from application layer it's bad idea. Any change in your firmware code can break this.

If you use a timer peripheral with PWM output capability as suggested by #RBerteig already, then you can generate an accurate timing signal with zero software overhead. If you need to do other work synchronously with the clock, then you can use the timer interrupt to trigger that too. However if you process interrupts at an interval of 12.5us you may find that your processor spends a great deal of time context switching rather than performing useful work.
If you simply want an accurate delay, then you should still use a hardware timer and poll its reload flag rather than process its interrupt. This allows consistent timing independent of the compiler's code generation or processor speed and allows you to add other code within the loop without extending the total loop time. You would poll it in a loop during which you might do other work as well. The timing jitter and determinism will depend on what other work you do in the loop, but for an empty loop, reaction to the timer even will probably be faster than the latency on an interrupt handler.

How to implement C timer in Vxworks

I have implemented timer functionality to find the performance of my task in windows and linux. But linux implementation is not working in Vxworks PPC 750 board. gettimeofday is not available in Vxworks.
t1 = vxworks_start_timer(); //How to implement ?
my_task();
t2 = vxworks_stop_timer(); //How to implement ?
elapsedtime = t2-t1;
How to implement this timer in Vxworks to calculate elapsed time of a task.

There are various approaches to this, dependant on your needs.
If the activity to be measured is long running, you might prefer to use the system tick counter, accessible via tickGet() or tickGet64().
This increments at the system clock rate frequency (i.e. the rate of the scheduler, not the CPU freq), and so the resolution is limited to a single tick - which might be as large as 1/60th of a second. You can use sysClkRateGet() to determine the frequency.
For long running tasks, the above is probably sufficient, however if you require higher resolution, possibly at the expense of limited duration, you can use the system timestamp counter, if it is configured. For this, you can use sysTimestamp() (or sysTimestamp64()), and also use sysTimestampFreq() to get the frequency.
Dependant on your system configuration, the counter may reset frequently, and you can use sysTimestampPeriod() to workout when this will occur - you will need to handle this in your timing code.
You can, of course, use both methods together to provide both a long running, yet high resolution timer

If the system timer tick resolution is sufficient, you could use tickGet() and sysClkRateGet()
or clock_gettime(), but resolution is still limited to system clock tick
Otherwise, you could read TBL and TBU (arch-specific)

CUDA: Difference between CPU timer and CUDA timer event?

What is the difference between using a CPU timer and the CUDA timer event to measure the time taken for the execution of some CUDA code?
Which of these should a CUDA programmer use？
And why?
what I know:
CPU timer usage would involve calling cudaThreadSynchronize before any time is noted.
For noting the time, one of these could be used:
clock()
high-resolution performance counter like QueryPerformanceCounter (on Windows)
CUDA timer event would involve recording before and after by using cudaEventRecord. At a later time, the elapsed time would be obtained by calling cudaEventSynchronize on the events, followed by cudaEventElapsedTime to obtain the elapsed time.

The answer to the first part of question is that cudaEvents timers are based off high resolution counters on board the GPU, and they have lower latency and better resolution than using a host timer because they come "off the metal". You should expect sub-microsecond resolution from the cudaEvents timers. You should prefer them for timing GPU operations for precisely that reason. The per-stream nature of cudaEvents can also be useful for instrumenting asynchronous operations like simultaneous kernel execution and overlapped copy and kernel execution. Doing that sort of time measurement is just about impossible using host timers.
EDIT: I won't answer the last paragraph because you deleted it.

The main advantage of using CUDA events for timing is that they're less subject to perturbations due to other system events, like paging or interrupts from the disk or network controller. Also, because the cu(da)EventRecord is asynchronous, there is less of a Heisenberg effect when timing short, GPU-intensive operations.
Another advantage of CUDA events is that they have a clean cross-platform API - no need to wrap gettimeofday() or QueryPerformanceCounter().
One final note: use caution when using streamed CUDA events for timing - if you do not specify the NULL stream, you may wind up timing operations that you did not intend to. There is a good analogy between CUDA events and reading the CPU's timestamp counter, which is a serializing instruction. On modern superscalar processors, the serializing semantics make the timing unambiguous. Also like RDTSC, you should always bracket the events you want to time with enough work that the timing is meaningful (just like you can't use RDTSC to meaningfully time a single machine instruction).