I'm a bit unsure what's the best approach to the problem given my knowledge of the STM32. I want to measure the speed and position of a motor with an integrated hall encoder of 6400 rising/falling edges per rotation, separated into two channels (one CH gives 3200 rising/falling edges).
What's the best way to do it?
The thing is... I have 4 motors to measure.
I considered many options, but I would like one that only generates interrupts when the position data is already known (basically, so I don't increment myself a position variable at each pulse but instead let a timer do it for me).
From what I know, a few timers support a mode called "Encoder mode". I don't know the details about this mode, but I would like (if possible) to be able to calculate my speed at a fixed amount of time (say around 20ms).
Is it possible in encoder mode, with one timer, to know both the rising/falling edges count (which I guess would be in the CNT register) and have it trigger an interrupt at each 20 ms, so that I can divide the CNT register by 20ms to get the count/sec speed within the ISR?
The other option I have is to count with Input Capture direct mode with two channels on each timer (one for each motor), and have another timer with a fixed period of 20ms, and calculate all the speeds of the 4 motors there. But it requires 5 timers...
If anything else, is there a way DMA could help to keep it to 4 timers? For example, can we count with DMA?
Thanks!
The encoder interface mode on the STM32F407 is supported on timers 1 & 8 (Advanced Control timers - 16 bit) and timers 2 to 5 (General purpose timers - 16/32 bit). Timers 9 to 14 (also General purpose) do not support quadrature encode input.
It is important that in this mode the timer is operating as a counter rather than a timer. The quadrature input allows up/down count depending on the direction, so that it will provide relative position.
Note that if your motor will only ever travel in one direction, you do not need the encoder mode, you can simply clock a timer from a single channel, although that will reduce the resolution significantly, so accuracy at low speeds may suffer.
To determine speed, you need to calculate change in relative position over time.
All ARM Cortex-M devices have a SYSTICK timer which will generate a periodic interrupt. You can use this to count time.
You then have two possibilities:
read the encoder counter periodically whereby the change in count is directly proportional to speed (because change in time will be a constant),
read the encoder aperiodically and calculate change in position over change in time
The reload value for the encoder interface mode is configurable, for this application (speed rather then position), you should set the to the maximum (0xffff or 0xffffffff) since it makes the arithmetic simpler as you won't have to deal with wrap-around (so long as it does not wrap-around twice between reads).
For the aperiodic method and assuming you are using timers 2 to 5 in 32 bit mode, the following pseudo-code will generate speed in RPM for example:
int speedRPM_Aperiodic( int timer_id )
{
int rpm = 0 ;
static struct
{
uint32_t count ;
uint32_t time ;
} previous[] = {{0,0},{0,0},{0,0},{0,0}} ;
if( timer_id < sizeof(previous) / sizeof(*previous) )
{
uint32_t current_count = getEncoderCount( timer_id ) ;
int delta_count = previous[timer_id].count - current_count ;
previous[timer_id].count = current_count ;
uint32_t current_time = getTick() ;
int delta_time = previous[timer_id].time - current_time ;
previous[timer_id].time = current_time ;
rpm = (TICKS_PER_MINUTE * delta_count) /
(delta_time * COUNTS_PER_REVOLUTION) ;
}
return rpm ;
}
The function needs to be called often enough that the count does not wrap-around more than once, and not so fast that the count is too small for accurate measurement.
This can be adapted for a periodic method where delta_time is fixed and very accurate (such as from the timer interrupt or a timer handler):
int speedRPM_Periodic( int timer_id )
{
int rpm = 0 ;
uint32_t previous_count[] = {0,0,0,0} ;
if( timer_id < sizeof(previous_count) / sizeof(*previous_count) )
{
uint32_t current_count = getEncoderCount( timer_id ) ;
int delta_count = previous[timer_id].count - current_count ;
previous_count[timer_id] = current_count ;
rpm = (TICKS_PER_MINUTE * delta_count) /
(SPEED_UPDATE_TICKS * COUNTS_PER_REVOLUTION) ;
}
return rpm ;
}
This function must then be called exactly every SPEED_UPDATE_TICKS.
The aperiodic method is perhaps simpler to implement, and is good for applications where you want to know the mean speed over the elapsed period. Suitable for example a human readable display that might be updated relatively slowly.
The periodic method is better suited to speed control applications where you are using a feed-back loop to control the speed of the motor. You will get poor control if the feedback timing is not constant.
The aperiodic function could of course be called periodically, but has unnecessary overhead where delta time is deterministic.
A timer can count on one type of event
It can count either on some external signal like your sensors, or on a clock signal, but not on both of them at once. If you want to do something in every 20ms, you need something that counts on a stable clock source.
DMA can of course count the transfers it's doing, but to make it do something at every 20 ms, it has to be triggered at fixed time intervals by something.
Therefore, you need a fifth timer.
Fortunately, there are lots of timers to choose from
10 more timers
The F407 has 14 hardware timers. You don't want to use more than 4 of them, I'm assuming 10 of them are used elsewhere in your application. Check the usage of them. Perhaps there is one that counts on a suitable clock frequency, and can generate an interrupt with a frequency that is suitable for sampling your encoders.
SysTick timer
Cortex-M cores have an internal timer called SysTick. Many applications use it to generate an interrupt at every 1ms for timekeeping and other periodic tasks. If that's the case, you can read the encoder values in every 20th SysTick interrupt - this has the advantage of not requiring additional interrupt entry/exit overhead. Otherwise you can set it up directly to generate an interrupt at every 20ms. Note that you won't find SysTick in the Reference Manual, it's documented in the STM32F4 Programmers Manual.
Real-Time clock
The RTC has a periodic auto-wakeup function that can generate an interrupt every 20 ms.
UART
Unless you are using all 6 UARTS, you can set one of them to a really slow baud rate like 1000 baud, and keep transmitting dummy data (you don't have to assign a physical pin to it). Transmitting at 1000 baud, with 8 bit, one start and one stop bit gives an interrupt at every 10 ms. (It won't let you go down to 500 baud unless your APB frequency is lower than the maximum allowed)
Related
I have a rotary encoder with STM32F4 and configured TIM4 in "Encoder Mode TI1 and TI2". I want to have an interrupt every time the value of timer is incremented or decremented.
The counting works but I only can configure an interrupt on every update event, not every changes in TIM4->cnt. How can I do this?
In other words: My MCU+Encoder in quadrature mode could count from 0 to 99 in one revolution. I want to have 100 interrupts in the revolution but if I set TIM4->PSC=0 and TIM4->ARR=1, results 50 UPDATE_EVENTs, so I should set ARR=0 but it does not work. How can I sole that?
To get 100 interrupts per revolution keep PSC=0, ARR=1, setup the two timer channels in output compare mode with compare values 0 and 1 and interrupts on both channels.
Or even use ARR=3 and setup all four channels, with compare values of 0,1,2 and 3. This will allow to detect the direction.
Normally, the whole point of using the quadrature encoder mode is counting the pulses while avoiding interrupts. You can simply poll the counter register periodically to determine speed and position.
Getting interrupts on every encoder pulse is extremely inefficient, especially with high resolution encoders. Yours seems to be a low resolution one. If you still think you need them for some reason, you can connect A & B into external interrupts and implement the counting logic manually. In this case, you don't need quadrature encoder mode.
I have a segment of code below as a FreeRTOS task running on an STM32F411RE microcontroller:
static void TaskADCPWM(void *argument)
{
/* Variables used by FreeRTOS to set delays of 50ms periodically */
const TickType_t DelayFrequency = pdMS_TO_TICKS(50);
TickType_t LastActiveTime;
/* Update the variable RawAdcValue through DMA */
HAL_ADC_Start_DMA(&hadc1, (uint32_t*)&RawAdcValue, 1);
#if PWM_DMA_ON
/* Initialize PWM CHANNEL2 with DMA, to automatically change TIMx->CCR by updating a variable */
HAL_TIM_PWM_Start_DMA(&htim3, TIM_CHANNEL_2, (uint32_t*)&RawPWMThresh, 1);
#else
/* If DMA is not used, user must update TIMx->CCRy manually to alter duty cycle */
HAL_TIM_PWM_Start(&htim3, TIM_CHANNEL_2);
#endif
while(1)
{
/* Record last wakeup time and use it to perform blocking delay the next 50ms */
LastActiveTime = xTaskGetTickCount();
vTaskDelayUntil(&LastActiveTime, DelayFrequency);
/* Perform scaling conversion based on ADC input, and feed value into PWM CCR register */
#if PWM_DMA_ON
RawPWMThresh = (uint16_t)((RawAdcValue * MAX_TIM3_PWM_VALUE)/MAX_ADC_12BIT_VALUE);
#else
TIM3->CCR2 = (uint16_t)((RawAdcValue * MAX_TIM3_PWM_VALUE)/MAX_ADC_12BIT_VALUE);
#endif
}
}
The task above uses RawAdcValue value to update a TIM3->CCR2 register either through DMA or manually. The RawAdcValue gets updated periodically through DMA, and the value stored in this variable is 12-bits wide.
I understand how using DMA could benefit reading the ADC samples above as the CPU will not need to poll/wait for the ADC samples, or using the DMA to transfer long streams of data through I2C or SPI. But, is there a significant performance advantage to using DMA to update the TIM3->CCR2 register instead of manually modifying the TIM3->CCR2 register through:
TIM3->CCR2 &= ~0xFFFF;
TIM3->CCR2 |= SomeValue;
What would be the main differences between updating the CCR register through DMA or non-DMA?
Let's start by assuming you need to achieve "N samples per second". E.g. for audio this might be 44100 samples per second.
For PWM, you need to change the state of the output multiple times per sample. For example; for audio this might mean writing to the CCR around four times per sample, or "4*44100 = 176400" times per second.
Now look at what vTaskDelayUntil() does - most likely it sets up a timer and does a task switch, then (when the timer expires) you get an IRQ followed by a second task switch. It might add up to a total overhead of 500 CPU cycles each time you change the CCR. You can convert this into a percentage. E.g. (continuing the audio example), "176400 CCR updates per second * 500 cycles per update = about 88.2 million cycles per second of overhead", then, for 100 MHz CPU, you can do "88.2 million / 100 million = 88.2% of all CPU time wasted because you didn't use DMA".
The next step is to figure out where the CPU time comes from. There's 2 possibilities:
a) If your task is the highest priority task in the system (including being higher priority than all IRQs, etc); then every other task will become victims of your time consumption. In this case you've single-handedly ruined any point of bothering with a real time OS (probably better to just use a faster/more efficient non-real-time OS that optimizes "average case" instead of optimizing "worst case", and using DMA, and using a less powerful/cheaper CPU, to get a much better end result at a reduced "cost in $").
b) If your task isn't the highest priority task in the system, then the code shown above is broken. Specifically, an IRQ (and possibly a task switch/preemption) can occur immediately after the vTaskDelayUntil(&LastActiveTime, DelayFrequency);, causing theTIM3->CCR2 = (uint16_t)((RawAdcValue * MAX_TIM3_PWM_VALUE)/MAX_ADC_12BIT_VALUE); to occur at the wrong time (much later than intended). In pathological cases (e.g. where some other event like disk or network just happens to occur at a similar related frequency - e.g. at half your "CCR update frequency") this can easily become completely unusable (e.g. because turning the output on is often delayed more than intended and turning the output off is not).
However...
All of this depends on how many samples per second (or better, how many CCR updates per second) you actually need. For some purposes (e.g. controlling an electric motor's speed in a system that changes the angle of a solar panel to track the position of the sun throughout the day); maybe you only need 1 sample per minute and all the problems caused by using CPU disappear. For other purposes (e.g. AM radio transmissions) DMA probably won't be good enough either.
WARNING
Unfortunately, I can't/didn't find any documentation for HAL_ADC_Start_DMA(), HAL_TIM_PWM_Start() or HAL_TIM_PWM_Start_DMA() online, and don't know what the parameters are or how the DMA is actually being used. When I first wrote this answer I simply relied on a "likely assumption" that may have been a false assumption.
Typically, for DMA you have a block of many pieces of data (e.g. for audio, maybe you have a block 176400 values - enough for a whole second of sound at "4 values per sample, 44100 samples per second"); and while that transfer is happening the CPU is free to do other work (and not wasted). For continuous operation, the CPU might prepare the next block of data while the DMA transfer is happening, and when the DMA transfer completes the hardware would generate an IRQ and the IRQ handler will start the next DMA transfer for the next block of values (alternatively, the DMA channel could be configured for "auto-repeat" and the block of data might be a circular buffer). In that way, the "88.2% of all CPU time wasted because you didn't use DMA" would be "almost zero CPU time used because DMA controller is doing almost everything"; and the whole thing would be immune to most timing problems (an IRQ or higher priority task preempting can not influence the DMA controller's timing).
This is what I assumed the code is doing when it uses DMA. Specifically, I assumed that the every "N nanoseconds" the DMA would take the next raw value from a large block of raw values and use that next raw value (representing the width of the pulse) to set a timer's threshold to a value from 0 to N nanoseconds.
In hindsight; it's possibly more likely that the code sets up the DMA transfer for "1 value per transfer, with continual auto-repeat". In that case the DMA controller would be continually pumping whatever value happens to be in RawPWMThresh to the timer at a (possibly high) frequency, and then the code in the while(1) loop would be changing the value in RawPWMThresh at a (possibly much lower) frequency. For example (continuing the audio example); it could be like doing "16 values per sample (via. the DMA controller), with 44100 samples per second (via. the while(1) loop)". In that case; if something (an unrelated IRQ, etc) causes an unexpected extra delay after the vTaskDelayUntil(); then it's not a huge catastrophe (the DMA controller simply repeats the existing value for a little longer).
If that is the case; then the real difference could be "X values per sample with 20 samples per second" (with DMA) vs. "1 value per sample with 20 samples per second" (without DMA); where the overhead is the same regardless, but the quality of the output is much better with DMA.
However; without knowing what the code actually does (e.g. without knowing the frequency of the DMA channel and how things like the timer's prescaler are configured) it's also technically possible that when using DMA the "X values per sample with 20 samples per second" is actually "1 value per sample with 20 samples per second" (with X == 1). In that case, using DMA would be almost pointless (none of the performance benefits I originally assumed; and almost none of the "output quality" benefits I'm tempted to assume in hindsight, except for the "repeat old value if there's unexpected extra delay after the vTaskDelayUntil()").
First, remember that premature optimization is the cause of uncountably many problems. The question you need to ask is "what ELSE does the processor need to do?". If the processor has nothing better to do, then just poll and save yourself some programming effort.
If the processor does have something better to do (or you are running from batteries and want to save power) then you need to time how long the processor spends waiting between each thing that it needs to do.
In your case, you are using an operating system context switch in place of "waiting". You can time the cost of the switch-write-to-pwm-switch-back cycle by measuring the performance of some other thread.
Set up a system with two threads. Perform some task that you know the performance of in one thread, eg, some fixed computation or processor benchmark. Now set up the other thread to do your timer business above. Measure the performance of the first thread.
Next set up a similar system with only the first thread plus DMA doing the PWM. Measure the performance change, you have you answer.
Obviously this all depends very much on your exact system. There is no general answer that can be given. The closer your test is to your real system the more accurate the answer you will get.
PS: Your PWM will glitch using the above code. Replace the two writes with a single one:
TIM3->CCR2 &= ~0xFFFF;
TIM3->CCR2 |= SomeValue;
should be:
TIM3->CCR2 = ((TIM3->CCR2 & ~0xFFFF) | SomeValue);
i am working on STM32F4 and pretty new at it. I know basics of C but with more than 1 day research, i still not found a solution of this.
I simply want to make a delay function myself, processor runs at 168MHz ( HCLK ). So my intuition says that it produces 168x10^6 clock cycles at each seconds. So the method should be something like that,
1-Store current clock count to a variable
2-Time diff = ( clock value at any time - stored starting clock value ) / 168000000
This flow should give me time difference in terms of seconds and then i can use it to convert whatever i want.
But, unfortunately, despite it seems so easy, I just cant implement any methods to MCU.
I tried time.h but it did not work properly. For ex, clock() gave same result over and over, and time( the one returns seconds since 1970 ) gave hexadecimal 0xFFFFFFFF ( -1, I guess means error ) .
Thanks.
Edit : While writing i assumed that some func like clock() will return total clock count since the start of program flow, but now i think after 4Billion/168Million secs it will overflow uint32_t size. I am really confused.
The answer depends on the required precision and intervals.
For shorter intervals with sub-microsecond precision there is a cycle counter. Your suspicion is correct, it would overflow after 232/168*106 ~ 25.5 seconds.
For longer intervals there are timers that can be prescaled to support any possible subdivision of the 168 MHz clock. The most commonly used setup is the SysTick timer set to generate an interrupt at 1 kHz frequency, which increments a software counter. Reading this counter would give the number of milliseconds elapsed since startup. As it is usually a 32 bit counter, it would overflow after 49.7 days. The HAL library sets SysTick up this way, the counter can then be queried using the HAL_GetTick() function.
For even longer or more specialized timing requirements you can use the RTC peripheral which keeps calendar time, or the TIM peripherals (basic, general and advanced timers), these have their own prescalers, and they can be arranged in a master-slave setup to give almost arbitrary precision and intervals.
My processor is an STM32F437ZGT6 and I wish to count two different pulse trains (RPM). The range is quite wide, I may have an engine that idles at 150 rpm and we get a pulse from the cam, so 0.5 pulses per revolution, or 1.25 pulses per second. At the other extreme I may need to count 460 flywheel teeth at 3000 rpm, 23000 pulses per second. I have a prescaler available so I can divide the external event by up to 8 but even so this become too intense at higher speeds because every event or eight event causes an interrupt.
One alternative I am considering would be to have one timer use the external event as the clock and it would just count events within a time window. My difficulty comes from determining how to use another timer to control the window by setting and clearing CEN or some similar action.
In RM0090, section 18.3.15 Timer synchronization the example shows one timer controlling another, timer 1 controlling timer two. I thought that may be useable but althought I did not read otherwise I don't see that any two timers could be paired. The signal I am interested in actually feeds two timers. TIM1 ch1 and TIM9 ch1.
Any suggestions would be appreciated as I don't want to cobble up some Rube Goldberg scheme where one timer fires off an ISR and then the ISR opens and closes the time window.
I should have noted that a lookup table is provided that provides the expected engine speed and the number of pulses per revolution.
Thanks,
jh
If you want to just count external events, you can select external clock source for timer. (Point "Clock selection" of reference manual). SPL should have an example.
And read count from Tim CNT register every time you need.
The problem here is to read counts often enough.
Usually auto-reload register is 2 bytes so you have up to 2 ^ 16 counts before overflow, and loosing counted value.
Timers 2 and 5 have 4 bytes auto-reload registers so you have up to 2 ^ 32 counts.
If you need more then 2 ^ 32 counts you have at least two ways:
- Timer cascade, by setting one timer event as a clock for another.
You can find this in the reference manual as "Using one timer as prescaler for another timer".
Cascading offers you up to 2 ^ 64 timer.
There is an example for SPL in "TIM_CascadeSynchro" folder.
- Less beautiful, but more easy way is to create a counter variable and, increment it in timer irq handler.
Number of counts may be found as cnt_variable * TIMx-> ARR.
Several cascaded variables give the unlimited counter).
Thanks for the post. I will try to add a little detail. RPM 1 is fed into TIM3 ch2 and TIM4 ch1. RPM 2 is fed into TIM1 ch1 and Tim9 ch1. Both have a range of 1.25 pulses per second up to 30000 pulses per second. I am given the number of pulses per revolution which can range from 0.5 to 460 and the expected engine rpm, 150 - 3000 rpm so I can scale things a bit. The reason for feeding two different timers is to be able to use different counting techniques based on speed (pulses per second). For low speed I can capture events (pulses) and grab the timer count using an ISR. But when the pulse count gets high I want to use a different method so as not to incur more than 1000 interrupts per second per channel. So my idea there is to have one timer control another. One timer would simply count events without generating interrupts. The second counter would control the period of time that the first timer would be allowed to collect events.
Thanks,
jh
Seems like you need: timer synchronization with enabling/disabling slave timer according to the trigger output of master timer.
Description can be found in the following sections of RM0090:
18.3.14 Timers and external trigger synchronization in paragraph Slave mode: Gated mode
18.3.15 Timer synchronization in paragraph Using one timer to enable another timer
Also good explanation can be found in TIMx register section for registers TIMx_SMCR: bits TS and SMS; TIMx_CR2: bits MMS.
TIMx internal trigger connection (tables 93, 97 and 100) сontains possible connections of the trigger output of one timer with the input of another. Timers that you can use as master are marked in the picture below:
TIM_ExtTriggerSynchro example from SPL library can be used for code copy-paste.
I think the best way is:
Set RPM pins as external clock source for slave timer.
Set enabling/disabling of slave timer from the output compare of master timer. So changing TIMx_CCRx register value you can change duration of measurement.
Set master timer interrupts on update event (maybe on few events TIMx_RCR register).
Do all the calculations in master timer interrupt handler
Also it seems to me that you can just use 16 bit timer as RPM counter. Even on 30000 pulses you will have overflow every 2^16/30000 = 2,18 seconds which is rarely rare for STM32F4 clock frequencies. And use other timer, with, for example, 2 second period, interrupt for calculations.
Good luck!
I want to make CPU frequency on imx6s based board unchangeable. Even WFI call shouldn't affect. So far I see the significant CPU frequency drop during processor stays in WFI mode.
According to technical reference manual for imx6s the low power mode, the processor will be transfered after WFI, is configured by LPM bits of CLPCR register (page 855). I have set LPM bits to 0x0 what is mentioned as RUN mode. From my understanding this should be enough not to put processor in any kind of sleep mode (WAIT or STOP on imx).
Is there something else behind this? Am i missing something here?
Suppose I use non-linux custom environment.
Appendix (how and what i measure):
The experiment looks the following:
1. With WFI
p1 = ArmPmuGetCpuCycles()
Sleep(100 miliseconds) <- here it goes to WFI
p2 = ArmPmuGetCpuCycles()
p2 - p1 = 600 microseconds
2. Without WFI
p1 = ArmPmuGetCpuCycles()
Sleep(100 miliseconds) <- WFI is removed
p2 = ArmPmuGetCpuCycles()
p2 - p1 = 100 miliseconds
CPU sleep modes and DVFS frequency setting are normally orthogonal power saving techniques.
WFI can generally trigger clock gating (stop clock logic which is known to be idle waiting for an interrupt, but doesn't normally change frequencies or voltages as these can be quite slow operations). Dynamically changing voltages and frequencies is normally handled by a higher-level kernel driver such as Cpufreq.
One of the main inputs into DVFS policies which drive dynamic CPU frequencies is utilization, so if your software is spending a lot of time in the OS idle loop (which will call WFI internally) then the higher level logic is going to decide you are under utilizing the CPU and start to drop frequencies to save power.
Update Given you are measuring frequency using the PMU, then what you are seeing is the PMU stopping to increment due to clock gating rather than a frequency change. Generally if you want to keep some notion of system time you will need a timer peripheral which doesn't get put into lower power mode; these are widely available for exactly this use case, but may have a coarser measurement granularity.