FreeRTOS slow systick - timer

I am experiencing a problem with FreeRTOS where it seems the systick() rate is 1/2 the expected rate. All timer or task delay functions take about 2X the time. This was verified in versions 8.2.0 and 8.2.3 using a STM32F100 processor.
There is another posting that looks very similar. This developer is using a MSP430 and claims the tick rate is 400Hz when expecting 1000Hz tick rate.
The RCC register configuration appears to be correct. If I create a non-FreeRTOS project where the systick is correct, it has the same RCC configuration as in the FreeRTOS version.
Suggestions?

When I read:
I created a very simple task that delays for 4 seconds and reports the
actual number of elapsed ticks. The ticks are correct but the actually
delay is around 8 seconds.
When reading this my thought was, if the delay is correct in the number of ticks, but the time is different, then this is simply a case of the CPU clock running at a different frequency to that which you think it is. Perhaps configCPU_CLOCK_HZ is wrong. However, then you write:
#undef OS_USE_TRACE_SEMIHOSTING
#define OS_USE_TRACE_ITM
you mention semihosting. Are you using semihosting? If so, don't, it will mess up your timing as it will stop the CPU while outputing to the host - and that might be the problem you are seeing.

Related

FreeRtos osDelay is exactly three times too long

Using Nucleo STM32H723 board making an alive LED toggle in the StartDefaultTask.
osDelay(10000) gives exactly30 seconds of delay. I'm using timer7 as systick timer.
Bonus info:
FreeRTOSConfig.h
#define configCPU_CLOCK_HZ ( SystemCoreClock )
"SystemCoreClock" is initially set to 64MHz (HSI) but during start-up initialized to 216MHz (SYSClk). If I edit "FreeRTOSConfig.h" and set "configCPU_CLOCK_HZ" to 64000000 osDelay is now correct, but I would like to be able to generate the clock and rtos config with CubeIDE.
Maybe someone can tell me what I am doing wrong...
I am not familiar with that board, but I think the OS thinks SysTick ticks # 1/3 speed of its actual (hardware) speed. You have to sync that value to the OS. Check your FreeRTOSConfig.h file for configSYSTICK_CLOCK_HZ and related parameters. As for the CubeIDE/FreeRTOS integration, check the suitable extensions/expansions of the IDE. Other than that, you have to edit files manually...

Cortex-R5 PMU cycle counter not counting time correctly

I have the PMU configured correctly for PMCCNTR to tick on a Cortex-R5 running FreeRTOS. I will omit the configuration code since it's been repeated on many other StackOverflow questions. I believe the configuration is correctly because I tried running
__asm__ volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(pmccntr))
periodically, and I was able to see that the pmccntr variable increases monotonically and rolls over after (2^32 - 1).
The CPU is running at 800Mhz, so I expected that if I were to read PMCCNTR in a 1Hz task I would notice that the value increases by 800Mhz. However, the difference in PMCCNTR in between calls to the 1Hz task is more like 72 million. I also tried playing with the 64 clock divider to make sure my observations are sane.
Is my math correct? Or perhaps I am using the wrong number as the CPU frequency? What would be a deterministic way to figure out what frequency the PMCCNTR should be counting at?
Update: The root cause is WFI as #Sean Houlihane pointed out
The PMCCNTR does run at core clock, so long as the counter is not disabled and the core isn't in debug state. If you calculate 72 MHz/1.125 MHz then there is a good chance your core is running at external crystal frequency, not from the internal PLL.
The other likely explanation is that the core is in WFI state with clocks stopped for most of the time - in which case the result you measure will be influenced by the amount of work done by the OS.

How to write a time difference function to STM32F4

i am working on STM32F4 and pretty new at it. I know basics of C but with more than 1 day research, i still not found a solution of this.
I simply want to make a delay function myself, processor runs at 168MHz ( HCLK ). So my intuition says that it produces 168x10^6 clock cycles at each seconds. So the method should be something like that,
1-Store current clock count to a variable
2-Time diff = ( clock value at any time - stored starting clock value ) / 168000000
This flow should give me time difference in terms of seconds and then i can use it to convert whatever i want.
But, unfortunately, despite it seems so easy, I just cant implement any methods to MCU.
I tried time.h but it did not work properly. For ex, clock() gave same result over and over, and time( the one returns seconds since 1970 ) gave hexadecimal 0xFFFFFFFF ( -1, I guess means error ) .
Thanks.
Edit : While writing i assumed that some func like clock() will return total clock count since the start of program flow, but now i think after 4Billion/168Million secs it will overflow uint32_t size. I am really confused.
The answer depends on the required precision and intervals.
For shorter intervals with sub-microsecond precision there is a cycle counter. Your suspicion is correct, it would overflow after 232/168*106 ~ 25.5 seconds.
For longer intervals there are timers that can be prescaled to support any possible subdivision of the 168 MHz clock. The most commonly used setup is the SysTick timer set to generate an interrupt at 1 kHz frequency, which increments a software counter. Reading this counter would give the number of milliseconds elapsed since startup. As it is usually a 32 bit counter, it would overflow after 49.7 days. The HAL library sets SysTick up this way, the counter can then be queried using the HAL_GetTick() function.
For even longer or more specialized timing requirements you can use the RTC peripheral which keeps calendar time, or the TIM peripherals (basic, general and advanced timers), these have their own prescalers, and they can be arranged in a master-slave setup to give almost arbitrary precision and intervals.

Long Delay using Delay Functions from C18 Libraries for PIC18

I'm using a PIC18 with Fosc = 10MHz. So if I use Delay10KTCYx(250), I get 10,000 x 250 x 4 x (1/10e6) = 1 second.
How do I use the delay functions in the C18 for very long delays, say 20 seconds? I was thinking of just using twenty lines of Delay10KTCYx(250). Is there another more efficient and elegant way?
Thanks in advance!
It is strongly recommended that you avoid using the built-in delay functions such as Delay10KTCYx()
Why you might ask?
These delay functions are very inaccurate, and they may cause your code to be compiled in unexpected ways. Here's one such example where using the Delay10KTCYx() function can cause problems.
Let's say that you have a PIC18 microprocessor that has only two hardware timer interrupts. (Usually they have more but let's just say there are only two).
Now let's say you manually set up the first hardware timer interrupt to blink once per second exactly, to drive a heartbeat monitor LED. And let's say you set up the second hardware timer interrupt to interrupt every 50 milliseconds because you want to take some sort of digital or analog reading at exactly 50 milliseconds.
Now, lastly, let's say that in your main program you want to delay 100,000 clock cycles. So you put a call to Delay10KTCYx(10) in your main program. What happenes do you suppose? How does the PIC18 magically count off 100,000 clock cycles?
One of two things will happen. It may "hijack" one of your other hardware timer interrupts to get exactly 100,000 clock cycles. This would either cause your heartbeat sensor to not clock at exactly 1 second, or, cause your digital or analog readings to happen at some time other than every 50 milliseconds.
Or, the delay function will just call a bunch of Nop() and claim that 1 Nop() = 1 clock cycle. What isn't accounted for is "overheads" within the Delay10KTCYx(10) function itself. It has to increment a counter to keep track of things, and surely it takes more than 1 clock cycle to increment the timer. As the Delay10KTCYx(10) loops around and around it is just not capable of giving you exactly 100,000 clock cycles. Depending on a lot of factors you may get way more, or way less, clock cycles than you expected.
The Delay10KTCYx(10) should only be used if you need an "approximate" amount of time. And pre-canned delay functions shouldn't be used if you are already using the hardware timer interrupts for other purposes. The compiler may not even successfully compile when using Delay10KTCYx(10) for very long delays.
I would highly recommend that you set up one of your timer interrupts to interrupt your hardware at a known interval. Say 50,000 clock cycles. Then, each time the hardware interrupts, within your ISR code for that timer interrupt, increment a counter and reset the timer over again to 0 cycles. When enough 50,000 clock cycles have expired to equal 20 seconds (or in other words in your example, 200 timer interrupts at 50,000 cycles per interrupt), reset your counter. Basically my advice is that you should always manually handle time in a PIC and not rely on pre-canned Delay functions - rather build your own delay functions that integrate into the hardware timer of the chip. Yes, it's going to be extra work - "but why can't I just use this easy and nifty built-in delay function, why would they even put it there if it's gonna muck up my program?" - but this should become second nature. Just like you should be manually configuring EVERY SINGLE REGISTER in your PIC18 upon boot-up, whether you are using it or not, to prevent unexpected things from happening.
You'll get way more accurate timing - and way more predictable behavior from your PIC18. Using pre-canned Delay functions is a recipe for disaster... it may work... it may work on several projects... but sooner or later your code will go all buggy on you and you'll be left wondering why and I guarantee the culprit will be the pre-canned delay function.
To create very long time use an internal timer. This can helpful to avoid block in your application and you can check the running time. Please refer to PIC data sheet on how to setup a timer and its interrupt.
If you want a very high precision 1S time I suggest also to consider an external RTC device or an internal RTC if the micro has one.

1ms resolution timer under linux recommended way

I need a timer tick with 1ms resolution under linux. It is used to increment a timer value that in turn is used to see if various Events should be triggered. The POSIX timerfd_create is not an option because of the glibc requirement. I tried timer_create and timer_settimer, but the best I get from them is a 10ms resolution, smaller values seem to default to 10ms resolution. Getittimer and setitimer have a 10 ms resolution according to the manpage.
The only way to do this timer I can currently think of is to use clock_gettime with CLOCK_MONOTONIC in my main loop an test if a ms has passed, and if so to increase the counter (and then check if the various Events should fire).
Is there a better way to do this than to constantly query in the main loop? What is the recommended solution to this?
The language I am using is plain old c
Update
I am using a 2.6.26 Kernel. I know you can have it interrupt at 1kHz, and the POSIX timer_* functions then can be programmed to up to 1ms but that seems not to be reliable and I don't want to use that, because it may need a new kernel on some Systems. Some stock Kernel seem to still have the 100Hz configured. And I would need to detect that. The application may be run on something else than my System :)
I can not sleep for 1ms because there may be network events I have to react to.
How I resolved it
Since it is not that important I simply declared that the global timer has a 100ms resolution. All events using their own timer have to set at least 100ms for timer expiration. I was more or less wondering if there would be a better way, hence the question.
Why I accepted the answer
I think the answer from freespace best described why it is not really possible without a realtime Linux System.
Polling in the main loop isn't an answer either - your process might not get much CPU time, so more than 10ms will elapse before your code gets to run, rendering it moot.
10ms is about the standard timer resolution for most non-realtime operating systems (RTOS). But it is moot in a non-RTOS - the behaviour of the scheduler and dispatcher is going to greatly influence how quickly you can respond to a timer expiring. For example even suppose you had a sub 10ms resolution timer, you can't respond to the timer expiring if your code isn't running. Since you can't predict when your code is going to run, you can't respond to timer expiration accurately.
There is of course realtime linux kernels, see http://www.linuxdevices.com/articles/AT8073314981.html for a list. A RTOS offers facilities whereby you can get soft or hard guarantees about when your code is going to run. This is about the only way to reliably and accurately respond to timers expiring etc.
To get 1ms resolution timers do what libevent does.
Organize your timers into a min-heap, that is, the top of the heap is the timer with the earliest expiry (absolute) time (a rb-tree would also work but with more overhead). Before calling select() or epoll() in your main event loop calculate the delta in milliseconds between the expiry time of the earliest timer and now. Use this delta as the timeout to select(). select() and epoll() timeouts have 1ms resolution.
I've got a timer resolution test that uses the mechanism explained above (but not libevent). The test measures the difference between the desired timer expiry time and its actual expiry of 1ms, 5ms and 10ms timers:
1000 deviation samples of 1msec timer: min= -246115nsec max= 1143471nsec median= -70775nsec avg= 901nsec stddev= 45570nsec
1000 deviation samples of 5msec timer: min= -265280nsec max= 256260nsec median= -252363nsec avg= -195nsec stddev= 30933nsec
1000 deviation samples of 10msec timer: min= -273119nsec max= 274045nsec median= 103471nsec avg= -179nsec stddev= 31228nsec
1000 deviation samples of 1msec timer: min= -144930nsec max= 1052379nsec median= -109322nsec avg= 1000nsec stddev= 43545nsec
1000 deviation samples of 5msec timer: min= -1229446nsec max= 1230399nsec median= 1222761nsec avg= 724nsec stddev= 254466nsec
1000 deviation samples of 10msec timer: min= -1227580nsec max= 1227734nsec median= 47328nsec avg= 745nsec stddev= 173834nsec
1000 deviation samples of 1msec timer: min= -222672nsec max= 228907nsec median= 63635nsec avg= 22nsec stddev= 29410nsec
1000 deviation samples of 5msec timer: min= -1302808nsec max= 1270006nsec median= 1251949nsec avg= -222nsec stddev= 345944nsec
1000 deviation samples of 10msec timer: min= -1297724nsec max= 1298269nsec median= 1254351nsec avg= -225nsec stddev= 374717nsec
The test ran as a real-time process on Fedora 13 kernel 2.6.34, the best achieved precision of 1ms timer was avg=22nsec stddev=29410nsec.
I'm not sure it's the best solution, but you might consider writing a small kernel module that uses the kernel high-res timers to do timing. Basically, you'd create a device file for which reads would only return on 1ms intervals.
An example of this type of approach is used in the Asterisk PBX, via the ztdummy module. If you google for ztdummy you can find the code that does this.
I think you'll have trouble achieving 1 ms precision with standard Linux even with constant querying in the main loop, because the kernel does not ensure your application will get CPU all the time. For example, you can be put to sleep for dozens of milliseconds because of preemptive multitasking and there's little you can do about it.
You might want to look into Real-Time Linux.
If you are targeting x86 platform you should check HPET timers. This is hardware timer with large precision. It must be supported by your motherbord (right now all of them support it) and your kernel should contains driver for it as well. I have used it few times without any problems and was able to achieve much better resolution than 1ms.
Here is some documentation and examples:
http://www.kernel.org/doc/Documentation/timers/hpet.txt
http://www.kernel.org/doc/Documentation/timers/hpet_example.c
http://fpmurphy.blogspot.com/2009/07/linux-hpet-support.html
I seem to recall getting ok results with gettimeofday/usleep based polling -- I wasn't needing 1000 timers a second or anything, but I was needing good accuracy with the timing for ticks I did need -- my app was a MIDI drum machine controller, and I seem to remember getting sub-millisecond accuracy, which you need for a drum machine if you don't want it to sound like a very bad drummer (esp. counting MIDI's built-in latencies) -- iirc (it was 2005 so my memory is a bit fuzzy) I was getting within 200 microseconds of target times with usleep.
However, I was not running much else on the system. If you have a controlled environment you might be able to get away with a solution like that. If there's more going on the system (watch cron firing up updatedb, etc.) then things may fall apart.
Are you running on a Linux 2.4 kernel?
From VMware KB article #1420 (http://kb.vmware.com/kb/1420).
Linux guest operating systems keep
time by counting timer interrupts.
Unpatched 2.4 and earlier kernels
program the virtual system timer to
request clock interrupts at 100Hz (100
interrupts per second). 2.6 kernels,
on the other hand, request interrupts
at 1000Hz - ten times as often. Some
2.4 kernels modified by distribution vendors to contain 2.6 features also
request 1000Hz interrupts, or in some
cases, interrupts at other rates, such
as 512Hz.
There is ktimer patch for linux kernel:
http://lwn.net/Articles/167897/
http://www.kernel.org/pub/linux/kernel/projects/rt/
HTH
First, get the kernel source and compile it with an adjusted HZ parameter.
If HZ=1000, timer interrupts 1000 times per seconds. It is ok to use HZ=1000 for an i386 machine.
On an embedded machine, HZ might be limited to 100 or 200.
For good operation, PREEMPT_KERNEL option should be on. There are
kernels which does not support this option properly. You can check them out by
searching.
Recent kernels, i.e. 2.6.35.10, supports NO_HZ options, which turns
on dynamic ticks. This means that there will be no timer ticks when in idle,
but a timer tick will be generated at the specified moment.
There is a RT patch to the kernel, but hardware support is very limited.
Generally RTAI is an all killer solution to your problem, but its
hardware support is very limited. However, good CNC controllers, like
emc2, use RTAI for their clocking, maybe 5000 Hz, but it can be
hard work to install it.
If you can, you could add hardware to generate pulses. That would make
a system which can be adapted to any OS version.
You don't need an RTOS for a simple real time application. All modern processors have General Purpose timers. Get a datasheet for whatever target CPU you are working on. Look in the kernel source, under the arch directory you will find processor specific source how to handle these timers.
There are two approaches you can take with this:
1) Your application is ONLY running your state machine, and nothing else. Linux is simply your "boot loader." Create a kernel object which installs a character device. On insertion into the kernel, set up your GP Timer to run continuously. You know the frequency it's operating at. Now, in the kernel, explicitly disable your watchdog. Now disable interrupts (hardware AND software) On a single-cpu Linux kernel, calling spin_lock() will accomplish this (never let go of it.) The CPU is YOURS. Busy loop, checking the value of the GPT until the required # of ticks have passed, when they have, set a value for the next timeout and enter your processing loop. Just make sure that the burst time for your code is under 1ms
2) A 2nd option. This assumes you are running a preemptive Linux kernel. Set up an unused a GPT along side your running OS. Now, set up an interrupt to fire some configurable margin BEFORE your 1ms timeout happens (say 50-75 uSec.) When the interrupt fires, you will immediately disable interrupts and spin waiting for 1ms window to occur, then entering your state machine and subsequently enabling interrupts on your wait OUT. This accounts for the fact that you are cooperating with OTHER things in the kernel which disable interrupts. This ASSUMES that there is no other kernel activity which locks out interrupts for a long time (more than 100us.) Now, you can MEASURE the accuracy of your firing event and make the window larger until it meets your need.
If instead you are trying to learn how RTOS's work...or if you are trying to solve a control problem with more than one real-time responsibility...then use an RTOS.
Can you at least use nanosleep in your loop to sleep for 1ms? Or is that a glibc thing?
Update: Never mind, I see from the man page "it can take up to 10 ms longer than specified until the process becomes runnable again"
What about using "/dev/rtc0" (or "/dev/rtc") device and its related ioctl() interface? I think it offers an accurate timer counter. It is not possible to set the rate just to 1 ms, but to a close value or 1/1024sec (1024Hz), or to a higher frequency, like 8192Hz.

Resources