I am having following code from big code base of an embedded application. I am trying to understand code and have following questions.
old_rate = sysAuxClkRateGet();
sysAuxClkRateSet(50);
sysAuxClkConnect ((FUNCPTR) scanDispatcher, 0);
/* Enable dispatcher */
sysAuxClkEnable ();
My questions are
Do scanDispatcher is called for each tick or after 50 ticks?
Is sysAuxClkRateSet(50); means we have 50 ticks per second? Is my understanding is right.
The auxiliary clock ISR will call scanDispatcher (with argument 0) every time it's invoked to handle the auxilary clock interrupt.
sysAuxClkRateSet(50) defines the frequency of the auxiliary clock interrupt. Since the auxiliary clock driver ISR doesn't perform other actions than managing the timer device and calling the scanDispatcher routine, you can change the frequency.
There are two kind of limits in the frequency values you can use:
The auxiliary clock driver (part of the BSP you're using) defines absolute minimum and maximum values that the driver is able to manage
The real maximum limit is defined by the system load introduced by scanDispatcher and it's execution time; remember, in any case, that scanDispatcher is executed at interrupt time, so its' execution time should always be very short.
A last caveat: auxiliary clock isn't a mandatory device in VxWorks: most of the BSPs support an auxiliary clock device, but (in principle) you could find a BSP that doesn't support it.
Related
I have a segment of code below as a FreeRTOS task running on an STM32F411RE microcontroller:
static void TaskADCPWM(void *argument)
{
/* Variables used by FreeRTOS to set delays of 50ms periodically */
const TickType_t DelayFrequency = pdMS_TO_TICKS(50);
TickType_t LastActiveTime;
/* Update the variable RawAdcValue through DMA */
HAL_ADC_Start_DMA(&hadc1, (uint32_t*)&RawAdcValue, 1);
#if PWM_DMA_ON
/* Initialize PWM CHANNEL2 with DMA, to automatically change TIMx->CCR by updating a variable */
HAL_TIM_PWM_Start_DMA(&htim3, TIM_CHANNEL_2, (uint32_t*)&RawPWMThresh, 1);
#else
/* If DMA is not used, user must update TIMx->CCRy manually to alter duty cycle */
HAL_TIM_PWM_Start(&htim3, TIM_CHANNEL_2);
#endif
while(1)
{
/* Record last wakeup time and use it to perform blocking delay the next 50ms */
LastActiveTime = xTaskGetTickCount();
vTaskDelayUntil(&LastActiveTime, DelayFrequency);
/* Perform scaling conversion based on ADC input, and feed value into PWM CCR register */
#if PWM_DMA_ON
RawPWMThresh = (uint16_t)((RawAdcValue * MAX_TIM3_PWM_VALUE)/MAX_ADC_12BIT_VALUE);
#else
TIM3->CCR2 = (uint16_t)((RawAdcValue * MAX_TIM3_PWM_VALUE)/MAX_ADC_12BIT_VALUE);
#endif
}
}
The task above uses RawAdcValue value to update a TIM3->CCR2 register either through DMA or manually. The RawAdcValue gets updated periodically through DMA, and the value stored in this variable is 12-bits wide.
I understand how using DMA could benefit reading the ADC samples above as the CPU will not need to poll/wait for the ADC samples, or using the DMA to transfer long streams of data through I2C or SPI. But, is there a significant performance advantage to using DMA to update the TIM3->CCR2 register instead of manually modifying the TIM3->CCR2 register through:
TIM3->CCR2 &= ~0xFFFF;
TIM3->CCR2 |= SomeValue;
What would be the main differences between updating the CCR register through DMA or non-DMA?
Let's start by assuming you need to achieve "N samples per second". E.g. for audio this might be 44100 samples per second.
For PWM, you need to change the state of the output multiple times per sample. For example; for audio this might mean writing to the CCR around four times per sample, or "4*44100 = 176400" times per second.
Now look at what vTaskDelayUntil() does - most likely it sets up a timer and does a task switch, then (when the timer expires) you get an IRQ followed by a second task switch. It might add up to a total overhead of 500 CPU cycles each time you change the CCR. You can convert this into a percentage. E.g. (continuing the audio example), "176400 CCR updates per second * 500 cycles per update = about 88.2 million cycles per second of overhead", then, for 100 MHz CPU, you can do "88.2 million / 100 million = 88.2% of all CPU time wasted because you didn't use DMA".
The next step is to figure out where the CPU time comes from. There's 2 possibilities:
a) If your task is the highest priority task in the system (including being higher priority than all IRQs, etc); then every other task will become victims of your time consumption. In this case you've single-handedly ruined any point of bothering with a real time OS (probably better to just use a faster/more efficient non-real-time OS that optimizes "average case" instead of optimizing "worst case", and using DMA, and using a less powerful/cheaper CPU, to get a much better end result at a reduced "cost in $").
b) If your task isn't the highest priority task in the system, then the code shown above is broken. Specifically, an IRQ (and possibly a task switch/preemption) can occur immediately after the vTaskDelayUntil(&LastActiveTime, DelayFrequency);, causing theTIM3->CCR2 = (uint16_t)((RawAdcValue * MAX_TIM3_PWM_VALUE)/MAX_ADC_12BIT_VALUE); to occur at the wrong time (much later than intended). In pathological cases (e.g. where some other event like disk or network just happens to occur at a similar related frequency - e.g. at half your "CCR update frequency") this can easily become completely unusable (e.g. because turning the output on is often delayed more than intended and turning the output off is not).
However...
All of this depends on how many samples per second (or better, how many CCR updates per second) you actually need. For some purposes (e.g. controlling an electric motor's speed in a system that changes the angle of a solar panel to track the position of the sun throughout the day); maybe you only need 1 sample per minute and all the problems caused by using CPU disappear. For other purposes (e.g. AM radio transmissions) DMA probably won't be good enough either.
WARNING
Unfortunately, I can't/didn't find any documentation for HAL_ADC_Start_DMA(), HAL_TIM_PWM_Start() or HAL_TIM_PWM_Start_DMA() online, and don't know what the parameters are or how the DMA is actually being used. When I first wrote this answer I simply relied on a "likely assumption" that may have been a false assumption.
Typically, for DMA you have a block of many pieces of data (e.g. for audio, maybe you have a block 176400 values - enough for a whole second of sound at "4 values per sample, 44100 samples per second"); and while that transfer is happening the CPU is free to do other work (and not wasted). For continuous operation, the CPU might prepare the next block of data while the DMA transfer is happening, and when the DMA transfer completes the hardware would generate an IRQ and the IRQ handler will start the next DMA transfer for the next block of values (alternatively, the DMA channel could be configured for "auto-repeat" and the block of data might be a circular buffer). In that way, the "88.2% of all CPU time wasted because you didn't use DMA" would be "almost zero CPU time used because DMA controller is doing almost everything"; and the whole thing would be immune to most timing problems (an IRQ or higher priority task preempting can not influence the DMA controller's timing).
This is what I assumed the code is doing when it uses DMA. Specifically, I assumed that the every "N nanoseconds" the DMA would take the next raw value from a large block of raw values and use that next raw value (representing the width of the pulse) to set a timer's threshold to a value from 0 to N nanoseconds.
In hindsight; it's possibly more likely that the code sets up the DMA transfer for "1 value per transfer, with continual auto-repeat". In that case the DMA controller would be continually pumping whatever value happens to be in RawPWMThresh to the timer at a (possibly high) frequency, and then the code in the while(1) loop would be changing the value in RawPWMThresh at a (possibly much lower) frequency. For example (continuing the audio example); it could be like doing "16 values per sample (via. the DMA controller), with 44100 samples per second (via. the while(1) loop)". In that case; if something (an unrelated IRQ, etc) causes an unexpected extra delay after the vTaskDelayUntil(); then it's not a huge catastrophe (the DMA controller simply repeats the existing value for a little longer).
If that is the case; then the real difference could be "X values per sample with 20 samples per second" (with DMA) vs. "1 value per sample with 20 samples per second" (without DMA); where the overhead is the same regardless, but the quality of the output is much better with DMA.
However; without knowing what the code actually does (e.g. without knowing the frequency of the DMA channel and how things like the timer's prescaler are configured) it's also technically possible that when using DMA the "X values per sample with 20 samples per second" is actually "1 value per sample with 20 samples per second" (with X == 1). In that case, using DMA would be almost pointless (none of the performance benefits I originally assumed; and almost none of the "output quality" benefits I'm tempted to assume in hindsight, except for the "repeat old value if there's unexpected extra delay after the vTaskDelayUntil()").
First, remember that premature optimization is the cause of uncountably many problems. The question you need to ask is "what ELSE does the processor need to do?". If the processor has nothing better to do, then just poll and save yourself some programming effort.
If the processor does have something better to do (or you are running from batteries and want to save power) then you need to time how long the processor spends waiting between each thing that it needs to do.
In your case, you are using an operating system context switch in place of "waiting". You can time the cost of the switch-write-to-pwm-switch-back cycle by measuring the performance of some other thread.
Set up a system with two threads. Perform some task that you know the performance of in one thread, eg, some fixed computation or processor benchmark. Now set up the other thread to do your timer business above. Measure the performance of the first thread.
Next set up a similar system with only the first thread plus DMA doing the PWM. Measure the performance change, you have you answer.
Obviously this all depends very much on your exact system. There is no general answer that can be given. The closer your test is to your real system the more accurate the answer you will get.
PS: Your PWM will glitch using the above code. Replace the two writes with a single one:
TIM3->CCR2 &= ~0xFFFF;
TIM3->CCR2 |= SomeValue;
should be:
TIM3->CCR2 = ((TIM3->CCR2 & ~0xFFFF) | SomeValue);
I am working on 8051 MCU from si labs. I want to generate exact 1ms delay using timer. For this I want to know what is the machine cycle time of a given MCU. The time taken by the MCU to complete one machine instruction. Then I can calculate how many machine cycles to complete 1ms delay.
Creating a time delay by counting MCU cycles is a poor method - especially if you are coding in C where you have no control over the machine instructions the compiler will generate - your loop will likely change depending on compiler options such as optimisation level.
Moreover the MCU has no means of measuring its own clock; its only concept of time passing is in clock-cycle units - asking it how long a cycle is is rather like asking a human how long a second is. The answer to the question of how long a clock-cycle is from the point of view of the MCU is always 1.
As the programmer of the system, it is your responsibility to know the clock speed. Typically the hardware defines the speed by its crystal or oscillator rate, and the MCU PLL settings determine the multiplier. Most often you will embed this speed as a constant in the start-up code; your code might access this constant.
Even then, you are better off creating delays using an on-chip timer unit rather than software-based instruction counting (and not all 8051 instructions are single cycle). In that case, you still need to know the clock speed; then the timer clock may be further divided from that.
To use the timer you need to know what is the frequency of the timer clock. Then you just need to : timer_clocks=delay*frequency;
Instruction timings you need to know only if you want blocking delay. There are two sources: uC documentation or experiment. To know how many loops you need just connect the oscilloscope to the pin and loop as many times as needed to archive the required impulse length
I'm working on an embedded project that's running on an ARM Cortex M3 based microcontroller. Some code provided by our vendor uses a delay function that sets up built-in hardware timer and then spins until the timer expires. Typically this is used to wait between 1 and a couple hundred microseconds. These delays are almost because they are waiting on some register, chip or bus to complete an action and need to wait at least the given number of microseconds. The hardware timer also appears to cost at least 6 microseconds in overhead to setup.
In a multithreaded environment this is a problem because there are N threads but only 1 hardware timer. I could disable interrupts while the timer is being used to prevent context switches and thus race conditions but it seems a bit ugly. I am thinking of replacing the function that uses the hardware timer with a function that uses the ARM CPU Cycle Counter (CCNT). Are there are pitfalls I am missing or other alternatives? Obviously the cycle counter function requires it be tuned to the proper CPU frequency which will never change for our system, but I suppose could be detected at boot programmatically using the hardware timer.
Setup the timer once at startup and let the counter run continuously. When you want to start a delay, read the counter value and remember this start value. Then in the delay loop read the counter value again and loop until the counter value minus the start value is greater than or equal to the requested delay ticks. (If you do the subtraction correctly then rollovers will wash out and you don't need special handling to check for them.)
You could multiplex your timer such that you have a table of when each thread wants to fire off and a function pointer / vector for execution. When the timer interrupt occurs, fire off that thread's interrupt and then set the timer to the next one in the list, minus elapsed time. This is what I see many *nix operating systems do in their kernel code, so there should be code to pull from as example.
A bigger concern is the fact that you are spin locking the thread waiting for the timer. Besides CPU usage, and depending on what OS you have (or if you have an OS) you could easily introduce thread inversion issues or even full on lock ups. It might be better to use thread primitives instead so that any OS can actually sleep your threads and wake them when needed.
Short question: How to get seconds since reset in STM32L051T6 microcontroller?
My effort and detailed issue:
I am using an STM32L051T6 series microcontroller. I need to count seconds since power on. I am also using low power mode. So I wrote code to use wakeup timer interrupt functionality of internal RTC of microcontroller. I used 1 second interval wake up timer with external LSE clock of 32768 Hz. I observed the accumulated seconds since power on (SSPO) after 3 days and found that it is falling behind by 115 seconds compared to actual time elapsed. My guess for this drift is interrupt latency in executing wakeup timer interrupt. How can I remove drift of this 115 seconds? Or is there any other better method than using wakeup interrupt to count seconds since power on?
UPDATE:
I tried to use Systick with HAL_GetTick() function as seconds since power on. But even systick is also getting delayed over time.
If you want to measure time with accuracy over a longer period, an RTC is the way to go. As you mentioned that you have an RTC, you can use the method below.
At startup, load the RTC with zero.
Then you can read the seconds elapsed when required without the errors above.
Edit: As per comment, the RTC can be changed by user. In that case,
If you can modify the RTC write function called by the user, then when the user calls the RTC write function, you update a global variable VarA = time set by user. The elapsed time will be Time read by RTC - VarA.
If the RTC is accurate, you should use the RTC by storing its value at boot time and later comparing to that saved value. But you said that the RTC can be reset by user so I can see two ways to cope with it:
if you have enough control on the system, replace the command or IHM that a user can use to reset the clock with a wrapper that inform you module and allows to read the RTC before and after it has been reset
if you have not enough control or cannot wrap the user's reset (because it uses a direct system call, etc.) use a timer to control the RTC value on every second
But you should define a threshold on the delta on RTC clock. If it is small, it is likely to be an adjustment because unless your system uses an atomic clock, even RTC can derive over time. In that case I would not care because you can hardly know whether it derived since last reboot or not. If you want a more clever algorythm, you can make the threshold dependant on the current time since last reboot: the longer the system is up, the higher the probability it has derived since then.
On the opposite, a large delta is likely to be a correction because RTC was blatantly erroneous, the saving battery is out of use, or what else. In that case you should compute the new start RTC time that gives same duration with the new RTC value.
As a rule of thumb, I would use a threshold of about 1 or 2 seconds per uptime day without RTC clock adjustement (ref) - meaning I would also store the time of last RTC adjustement, initially boot time.
I'm using a PIC18 with Fosc = 10MHz. So if I use Delay10KTCYx(250), I get 10,000 x 250 x 4 x (1/10e6) = 1 second.
How do I use the delay functions in the C18 for very long delays, say 20 seconds? I was thinking of just using twenty lines of Delay10KTCYx(250). Is there another more efficient and elegant way?
Thanks in advance!
It is strongly recommended that you avoid using the built-in delay functions such as Delay10KTCYx()
Why you might ask?
These delay functions are very inaccurate, and they may cause your code to be compiled in unexpected ways. Here's one such example where using the Delay10KTCYx() function can cause problems.
Let's say that you have a PIC18 microprocessor that has only two hardware timer interrupts. (Usually they have more but let's just say there are only two).
Now let's say you manually set up the first hardware timer interrupt to blink once per second exactly, to drive a heartbeat monitor LED. And let's say you set up the second hardware timer interrupt to interrupt every 50 milliseconds because you want to take some sort of digital or analog reading at exactly 50 milliseconds.
Now, lastly, let's say that in your main program you want to delay 100,000 clock cycles. So you put a call to Delay10KTCYx(10) in your main program. What happenes do you suppose? How does the PIC18 magically count off 100,000 clock cycles?
One of two things will happen. It may "hijack" one of your other hardware timer interrupts to get exactly 100,000 clock cycles. This would either cause your heartbeat sensor to not clock at exactly 1 second, or, cause your digital or analog readings to happen at some time other than every 50 milliseconds.
Or, the delay function will just call a bunch of Nop() and claim that 1 Nop() = 1 clock cycle. What isn't accounted for is "overheads" within the Delay10KTCYx(10) function itself. It has to increment a counter to keep track of things, and surely it takes more than 1 clock cycle to increment the timer. As the Delay10KTCYx(10) loops around and around it is just not capable of giving you exactly 100,000 clock cycles. Depending on a lot of factors you may get way more, or way less, clock cycles than you expected.
The Delay10KTCYx(10) should only be used if you need an "approximate" amount of time. And pre-canned delay functions shouldn't be used if you are already using the hardware timer interrupts for other purposes. The compiler may not even successfully compile when using Delay10KTCYx(10) for very long delays.
I would highly recommend that you set up one of your timer interrupts to interrupt your hardware at a known interval. Say 50,000 clock cycles. Then, each time the hardware interrupts, within your ISR code for that timer interrupt, increment a counter and reset the timer over again to 0 cycles. When enough 50,000 clock cycles have expired to equal 20 seconds (or in other words in your example, 200 timer interrupts at 50,000 cycles per interrupt), reset your counter. Basically my advice is that you should always manually handle time in a PIC and not rely on pre-canned Delay functions - rather build your own delay functions that integrate into the hardware timer of the chip. Yes, it's going to be extra work - "but why can't I just use this easy and nifty built-in delay function, why would they even put it there if it's gonna muck up my program?" - but this should become second nature. Just like you should be manually configuring EVERY SINGLE REGISTER in your PIC18 upon boot-up, whether you are using it or not, to prevent unexpected things from happening.
You'll get way more accurate timing - and way more predictable behavior from your PIC18. Using pre-canned Delay functions is a recipe for disaster... it may work... it may work on several projects... but sooner or later your code will go all buggy on you and you'll be left wondering why and I guarantee the culprit will be the pre-canned delay function.
To create very long time use an internal timer. This can helpful to avoid block in your application and you can check the running time. Please refer to PIC data sheet on how to setup a timer and its interrupt.
If you want a very high precision 1S time I suggest also to consider an external RTC device or an internal RTC if the micro has one.