ARM embedded delay hardware timer vs CPU cycle counter - arm

I'm working on an embedded project that's running on an ARM Cortex M3 based microcontroller. Some code provided by our vendor uses a delay function that sets up built-in hardware timer and then spins until the timer expires. Typically this is used to wait between 1 and a couple hundred microseconds. These delays are almost because they are waiting on some register, chip or bus to complete an action and need to wait at least the given number of microseconds. The hardware timer also appears to cost at least 6 microseconds in overhead to setup.
In a multithreaded environment this is a problem because there are N threads but only 1 hardware timer. I could disable interrupts while the timer is being used to prevent context switches and thus race conditions but it seems a bit ugly. I am thinking of replacing the function that uses the hardware timer with a function that uses the ARM CPU Cycle Counter (CCNT). Are there are pitfalls I am missing or other alternatives? Obviously the cycle counter function requires it be tuned to the proper CPU frequency which will never change for our system, but I suppose could be detected at boot programmatically using the hardware timer.

Setup the timer once at startup and let the counter run continuously. When you want to start a delay, read the counter value and remember this start value. Then in the delay loop read the counter value again and loop until the counter value minus the start value is greater than or equal to the requested delay ticks. (If you do the subtraction correctly then rollovers will wash out and you don't need special handling to check for them.)

You could multiplex your timer such that you have a table of when each thread wants to fire off and a function pointer / vector for execution. When the timer interrupt occurs, fire off that thread's interrupt and then set the timer to the next one in the list, minus elapsed time. This is what I see many *nix operating systems do in their kernel code, so there should be code to pull from as example.
A bigger concern is the fact that you are spin locking the thread waiting for the timer. Besides CPU usage, and depending on what OS you have (or if you have an OS) you could easily introduce thread inversion issues or even full on lock ups. It might be better to use thread primitives instead so that any OS can actually sleep your threads and wake them when needed.

Related

How to implement time counting in a new operating system?

i have to implement in a operating system, the function sleep().
Which is at the moment, not exisiting in the previous mentioned system.
The problem is, I have to count the elapsed time to wake the sleeping thread up.
How should i relize this? Do i have to count the CPU Ticks or is there another way?
Are CPU Ticks not dependend on the CPU frequency which is different for each CPU?
I have to implement the function in the language C.
the time function doesn't exist either
Thank you in advance!
Typically, such functionality is provided by a hardware timer interrupt, (and its associated driver), that manages a 'tick count' and a delta-queue of 'Thread Control Block' pointers, (pTCB). The pTCP's for sleeping threads are stored in the queue ordered by interval expiry tick count. The timer interrupt increments the tick count and checks it agains the expiry count of the item at the head of the queue.
When a thread requests a sleep, the thread pTCB is taken out of the set of ready threads, the expiry-count calculated and the pTCB inserted into the timer queue. When the pTCB reaches the end of the queue, and it's expiry tick has arrived, it is popped and added back to the set of ready threads so that it may be set running.
It totally depends on your platform/OS. It has to provide you some time-like information, e.g. ticks. Otherwise it is just impossible.
Converting ticks to seconds of course requires additional information. Again, this can be supplied by your platform. Or you have to find it out by other means (manual, configure it yourself, ...).
The easiest and most common way to do that in operating systems is to set up a timer interrupt at a static frequency, then build a timer framework on top of that, then use that timer framework to fire off wakeups for your sleeping threads.
A good paper that discusses various data structures for how to do it efficiently is here. I recommend from my own experience scheme 7. It's quite easy to implement and performs wonderfully.
You can find a fast implementation with a good API here. But I'm biased, because I wrote it.
If you don't want a timer interrupt with a static frequency it becomes much harder to implement a nice timer facility with good performance. I've done a few experiments, but I'd recommend you to start with simple timer interrupt with a static frequency. Once you start doing dynamic timers you need to exactly understand the tradeoffs you are prepared to make.
you can use time() :
time_t t = time();
while(time() < t + sleepDuration);
You may use the CPUs Time Stamp Counter (TSC) to get counter values for time keeping. See Chapter 16.12.1 of "Intel® 64 and IA-32 Architectures Software Developer’s Manual".
The TSC is a low level counter which may provide counter values independent of CPU speed:
"The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8].
The invariant TSC will run at a constant rate in all ACPI P-, C--, and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."
However, for the implementation of sleep() alike functionality you should look into timer
hardware like HPET, ACPI, and alike. See "Intel 64® and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2" and "IA-PC HPET (High Precision Event Timers) Specification" for details.

Generic Microcontroller Delay Function

Come someone please tell me how this function works? I'm using it in code and have an idea how it works, but I'm not 100% sure exactly. I understand the concept of an input variable N incrementing down, but how the heck does it work? Also, if I am using it repeatedly in my main() for different delays (different iputs for N), then do I have to "zero" the function if I used it somewhere else?
Reference: MILLISEC is a constant defined by Fcy/10000, or system clock/10000.
Thanks in advance.
// DelayNmSec() gives a 1mS to 65.5 Seconds delay
/* Note that FCY is used in the computation. Please make the necessary
Changes(PLLx4 or PLLx8 etc) to compute the right FCY as in the define
statement above. */
void DelayNmSec(unsigned int N)
{
unsigned int j;
while(N--)
for(j=0;j < MILLISEC;j++);
}
This is referred to as busy waiting, a concept that just burns some CPU cycles thus "waiting" by keeping the CPU "busy" doing empty loops. You don't need to reset the function, it will do the same if called repeatedly.
If you call it with N=3, it will repeat the while loop 3 times, every time counting with j from 0 to MILLISEC, which is supposedly a constant that depends on the CPU clock.
The original author of the code have timed and looked at the assembler generated to get the exact number of instructions executed per Millisecond, and have configured a constant MILLISEC to match that for the for loop as a busy-wait.
The input parameter N is then simply the number of milliseconds the caller want to wait and the number of times the for-loop is executed.
The code will break if
used on a different or faster micro controller (depending on how Fcy is maintained), or
the optimization level on the C compiler is changed, or
c-compiler version is changed (as it may generate different code)
so, if the guy who wrote it is clever, there may be a calibration program which defines and configures the MILLISEC constant.
This is what is known as a busy wait in which the time taken for a particular computation is used as a counter to cause a delay.
This approach does have problems in that on different processors with different speeds, the computation needs to be adjusted. Old games used this approach and I remember a simulation using this busy wait approach that targeted an old 8086 type of processor to cause an animation to move smoothly. When the game was used on a Pentium processor PC, instead of the rocket majestically rising up the screen over several seconds, the entire animation flashed before your eyes so fast that it was difficult to see what the animation was.
This sort of busy wait means that in the thread running, the thread is sitting in a computation loop counting down for the number of milliseconds. The result is that the thread does not do anything else other than counting down.
If the operating system is not a preemptive multi-tasking OS, then nothing else will run until the count down completes which may cause problems in other threads and tasks.
If the operating system is preemptive multi-tasking the resulting delays will have a variability as control is switched to some other thread for some period of time before switching back.
This approach is normally used for small pieces of software on dedicated processors where a computation has a known amount of time and where having the processor dedicated to the countdown does not impact other parts of the software. An example might be a small sensor that performs a reading to collect a data sample then does this kind of busy loop before doing the next read to collect the next data sample.

Setting up watchdog_set_period to max value causes reboot

I don't much about how watchdog timer works in embedded environment and I am facing issue related to watchdog timer
Maximum time out value defined in one of the macro is 55 and when we try to set up this value from watchdog_set_period function ,our board is getting reboot
#define Max_time_out 55
watchdog_set_period(int period) // Set watchdogs timeout counter
where period = 55
Now is it something expected or how what is the reason for reboot
We are writing this period value to some driver which we are accessing through file descriptor.
The link states this description on watchdog timers.
A watchdog timer is a piece of hardware that can be used to automatically detect software anomalies and reset the processor if any occur. Generally speaking, a watchdog timer is based on a counter that counts down from some initial value to zero. The embedded software selects the counter's initial value and periodically restarts it. If the counter ever reaches zero before the software restarts it, the software is presumed to be malfunctioning and the processor's reset signal is asserted. The processor (and the embedded software it's running) will be restarted as if a human operator had cycled the power.
You haven't posted the code so we can't judge what exactly is the problem. If you have written the code check if your code is causing any problems which is causing the watch dog timer to reset.
A watchdog timer is a special kind of timer usually found on embedded systems that is used to detect when the running software/firmware is hung up on some task. The watchdog timer is basically a countdown timer that counts from some initial value down to zero.
When zero is reached, the watchdog timer understands that the system is hung up and resets it.
Therefore, the running software must periodically update the watchdog timer(in a infinite while loop) with a new value to stop it from reaching zero and causing a reset. When the running software is locked up doing a certain task and cannot update(refresh fails) the watchdog timer, the timer will eventually reach zero and a reset/reboot will occur.
So in summary, if you enable watchdog timer then you need to periodically refresh watchdog timer. Otherwise the board reboots when the watchdog timer expires.

Long Delay using Delay Functions from C18 Libraries for PIC18

I'm using a PIC18 with Fosc = 10MHz. So if I use Delay10KTCYx(250), I get 10,000 x 250 x 4 x (1/10e6) = 1 second.
How do I use the delay functions in the C18 for very long delays, say 20 seconds? I was thinking of just using twenty lines of Delay10KTCYx(250). Is there another more efficient and elegant way?
Thanks in advance!
It is strongly recommended that you avoid using the built-in delay functions such as Delay10KTCYx()
Why you might ask?
These delay functions are very inaccurate, and they may cause your code to be compiled in unexpected ways. Here's one such example where using the Delay10KTCYx() function can cause problems.
Let's say that you have a PIC18 microprocessor that has only two hardware timer interrupts. (Usually they have more but let's just say there are only two).
Now let's say you manually set up the first hardware timer interrupt to blink once per second exactly, to drive a heartbeat monitor LED. And let's say you set up the second hardware timer interrupt to interrupt every 50 milliseconds because you want to take some sort of digital or analog reading at exactly 50 milliseconds.
Now, lastly, let's say that in your main program you want to delay 100,000 clock cycles. So you put a call to Delay10KTCYx(10) in your main program. What happenes do you suppose? How does the PIC18 magically count off 100,000 clock cycles?
One of two things will happen. It may "hijack" one of your other hardware timer interrupts to get exactly 100,000 clock cycles. This would either cause your heartbeat sensor to not clock at exactly 1 second, or, cause your digital or analog readings to happen at some time other than every 50 milliseconds.
Or, the delay function will just call a bunch of Nop() and claim that 1 Nop() = 1 clock cycle. What isn't accounted for is "overheads" within the Delay10KTCYx(10) function itself. It has to increment a counter to keep track of things, and surely it takes more than 1 clock cycle to increment the timer. As the Delay10KTCYx(10) loops around and around it is just not capable of giving you exactly 100,000 clock cycles. Depending on a lot of factors you may get way more, or way less, clock cycles than you expected.
The Delay10KTCYx(10) should only be used if you need an "approximate" amount of time. And pre-canned delay functions shouldn't be used if you are already using the hardware timer interrupts for other purposes. The compiler may not even successfully compile when using Delay10KTCYx(10) for very long delays.
I would highly recommend that you set up one of your timer interrupts to interrupt your hardware at a known interval. Say 50,000 clock cycles. Then, each time the hardware interrupts, within your ISR code for that timer interrupt, increment a counter and reset the timer over again to 0 cycles. When enough 50,000 clock cycles have expired to equal 20 seconds (or in other words in your example, 200 timer interrupts at 50,000 cycles per interrupt), reset your counter. Basically my advice is that you should always manually handle time in a PIC and not rely on pre-canned Delay functions - rather build your own delay functions that integrate into the hardware timer of the chip. Yes, it's going to be extra work - "but why can't I just use this easy and nifty built-in delay function, why would they even put it there if it's gonna muck up my program?" - but this should become second nature. Just like you should be manually configuring EVERY SINGLE REGISTER in your PIC18 upon boot-up, whether you are using it or not, to prevent unexpected things from happening.
You'll get way more accurate timing - and way more predictable behavior from your PIC18. Using pre-canned Delay functions is a recipe for disaster... it may work... it may work on several projects... but sooner or later your code will go all buggy on you and you'll be left wondering why and I guarantee the culprit will be the pre-canned delay function.
To create very long time use an internal timer. This can helpful to avoid block in your application and you can check the running time. Please refer to PIC data sheet on how to setup a timer and its interrupt.
If you want a very high precision 1S time I suggest also to consider an external RTC device or an internal RTC if the micro has one.

Software PWM without clobbering the CPU?

This is an academic question (I'm not necessarily planning on doing it) but I am curious about how it would work. I'm thinking of a userland software (rather than hardware) solution.
I want to produce PWM signals (let's say for a small number of digital GPIO pins, but more than 1). I would probably write a program which created a Pthread, and then infinitely looped over the duty cycle with appropriate sleep()s etc in that thread to get the proportions right.
Would this not clobber the CPU horribly? I imagine the frequency would be somewhere around the 100 Hz mark. I've not done anything like this before but I can imagine that the constant looping, context switches etc wouldn't be great for multitasking or CPU usage.
Any advice about CPU in this case use and multitasking? FWIW I'm thinking of a single-core processor. I have a feeling answers could range from 'that will make your system unusable' to 'the numbers involved are orders of magnitude smaller than will make an impact to a modern processor'!
Assume C because it seems most appropriate.
EDIT: Assume Linux or some other general purpose POSIX operating system on a machine with access to hardware GPIO pins.
EDIT: I had assumed it would be obvious how I would implement PWM with sleep. For the avoidance of doubt, something like this:
while (TRUE)
{
// Set all channels high
for (int c = 0; x < NUM_CHANNELS)
{
set_gpio_pin(c, 1);
}
// Loop over units within duty cycle
for (int x = 0; x < DUTY_CYCLE_UNITS; x++)
{
// Set channels low when their number is up
for (int c = 0; x < NUM_CHANNELS)
{
if (x > CHANNELS[c])
{
set_gpio_pin(c, 0);
}
}
sleep(DUTY_CYCLE_UNIT);
}
}
Use a driver if you can. If your embedded device has a PWM controller, then fine, else dedicate a hardware timer to generating the PWM intervals and driving the GPIO pins.
If you have to do this at user level, raising a process/thread to a high priority and using sleep() calls is sure to generate a lot of jitter and a poor pulse-width range.
You do not very clearly state the ultimate purpose of this, but since you have tagged this embedded and pthreads I will assume you have a dedicated chip with a linux variant running.
In this case, I would suggest the best way to create PWM output is through your main program loop, since I assume the PWM is part of a greater control application. Most simple embedded applications (no UI) can run in a single thread with periodic updates of the GPIOs in your main thread.
For example:
InitIOs();
while(1)
{
// Do stuff
UpdatePWM();
}
That being said, check your chip specification, in most embedded devices there are dedicated PWM output pins (that can also act as GPIOs) and those can be configured simply in hardware by setting a duty cycle and updating that duty cycle as required. In this case, the hardware will do the work for you.
If you can clarify your situation a bit I can likely give you a more detailed answer.
A better way is probably to use some kind interrupt-driven approach. I suppose it depends on your system, but IIRC Arduino uses interrupts for PWM.
100Hz seems about doable from user space. Typical OS task scheduler timeslices are around 10ms, too, so your CPU will already be multitasking at about that interval. You'll probably want to use a high process priority (low niceness) to ensure the sleeps won't overrun (much), and keep track of actual wall time and potentially adjust your sleep values down based on that feedback to avoid drift. You'll also need to make sure the timer the kernel uses for this on your hardware has a high enough resolution!
If you're very low on RAM and swapping heavily, you could run into problems with your program being paged out to disk. Also, if the kernel is doing other CPU-intensive stuff, this would also introduce unacceptable delays. (other, lower priority user space tasks should be ok) If keeping the frequency constant is critical, you're better off solving this in the kernel (or even running a realtime kernel).
Using a thread and sleeping on an OS that is not an RTOS is not going to produce very accurate or consistent results.
A better method is to use a timer interrupt and toggle the GPIO in the ISR. Unlike using a hardware PWM output on a hardware timer, this approach allows you to use a single timer for multiple signals and for other purposes. You will still probably see more jitter that a hardware PWM and the practical frequency range and pulse resolution will be much lower that is achievable in hardware, but at least the jitter will be in the order of microseconds rather than milliseconds.
If you have a timer, you can set that up to kick an interrupt each time a new PWM edge is required. With some clever coding, you can queue these up so the interrupt handler knows which of many PWM channels and whether a high or low going edge is required, and then schedule itself for the next required edge.
If you have enough of these timers, then its even easier as you can allocate one per PWM channel.
On an embedded controller with a low-latency interrupt response, this can produce surprisingly good results.
I fail to understand why you would want to do PWM in software with all of the inherent timing jitter that interrupt servicing and software interactions will introduce (e.g. the PWM interrupt hits when interrupts are disabled, the processor is servicing a long uninterruptible instruction, or another service routine is active). Most modern microcontrollers (ARM-7, ARM Cortex-M, AVR32, MSP, ...) have timers that can either be configured to produce or are dedicated as PWM generators. These will produce multiple rock steady PWM signals that, once set up, require zero processor input to keep running. These PWM outputs can be configured so that two signals do not overlap or have simultaneous edges, as required by the application.
If you are relying on the OS sleep function to set the time between the PWM edges then this will run slow. The sleep function will set the minimum time between task activations and the time between these will be delayed by the task switches, the presence of a higher priority thread or other kernel function running.

Resources