Software PWM without clobbering the CPU?

Software PWM without clobbering the CPU? - c

This is an academic question (I'm not necessarily planning on doing it) but I am curious about how it would work. I'm thinking of a userland software (rather than hardware) solution.
I want to produce PWM signals (let's say for a small number of digital GPIO pins, but more than 1). I would probably write a program which created a Pthread, and then infinitely looped over the duty cycle with appropriate sleep()s etc in that thread to get the proportions right.
Would this not clobber the CPU horribly? I imagine the frequency would be somewhere around the 100 Hz mark. I've not done anything like this before but I can imagine that the constant looping, context switches etc wouldn't be great for multitasking or CPU usage.
Any advice about CPU in this case use and multitasking? FWIW I'm thinking of a single-core processor. I have a feeling answers could range from 'that will make your system unusable' to 'the numbers involved are orders of magnitude smaller than will make an impact to a modern processor'!
Assume C because it seems most appropriate.
EDIT: Assume Linux or some other general purpose POSIX operating system on a machine with access to hardware GPIO pins.
EDIT: I had assumed it would be obvious how I would implement PWM with sleep. For the avoidance of doubt, something like this:
while (TRUE)
{
// Set all channels high
for (int c = 0; x < NUM_CHANNELS)
{
set_gpio_pin(c, 1);
}
// Loop over units within duty cycle
for (int x = 0; x < DUTY_CYCLE_UNITS; x++)
{
// Set channels low when their number is up
for (int c = 0; x < NUM_CHANNELS)
{
if (x > CHANNELS[c])
{
set_gpio_pin(c, 0);
}
}
sleep(DUTY_CYCLE_UNIT);
}
}

Use a driver if you can. If your embedded device has a PWM controller, then fine, else dedicate a hardware timer to generating the PWM intervals and driving the GPIO pins.
If you have to do this at user level, raising a process/thread to a high priority and using sleep() calls is sure to generate a lot of jitter and a poor pulse-width range.

You do not very clearly state the ultimate purpose of this, but since you have tagged this embedded and pthreads I will assume you have a dedicated chip with a linux variant running.
In this case, I would suggest the best way to create PWM output is through your main program loop, since I assume the PWM is part of a greater control application. Most simple embedded applications (no UI) can run in a single thread with periodic updates of the GPIOs in your main thread.
For example:
InitIOs();
while(1)
{
// Do stuff
UpdatePWM();
}
That being said, check your chip specification, in most embedded devices there are dedicated PWM output pins (that can also act as GPIOs) and those can be configured simply in hardware by setting a duty cycle and updating that duty cycle as required. In this case, the hardware will do the work for you.
If you can clarify your situation a bit I can likely give you a more detailed answer.

A better way is probably to use some kind interrupt-driven approach. I suppose it depends on your system, but IIRC Arduino uses interrupts for PWM.

100Hz seems about doable from user space. Typical OS task scheduler timeslices are around 10ms, too, so your CPU will already be multitasking at about that interval. You'll probably want to use a high process priority (low niceness) to ensure the sleeps won't overrun (much), and keep track of actual wall time and potentially adjust your sleep values down based on that feedback to avoid drift. You'll also need to make sure the timer the kernel uses for this on your hardware has a high enough resolution!
If you're very low on RAM and swapping heavily, you could run into problems with your program being paged out to disk. Also, if the kernel is doing other CPU-intensive stuff, this would also introduce unacceptable delays. (other, lower priority user space tasks should be ok) If keeping the frequency constant is critical, you're better off solving this in the kernel (or even running a realtime kernel).

Using a thread and sleeping on an OS that is not an RTOS is not going to produce very accurate or consistent results.
A better method is to use a timer interrupt and toggle the GPIO in the ISR. Unlike using a hardware PWM output on a hardware timer, this approach allows you to use a single timer for multiple signals and for other purposes. You will still probably see more jitter that a hardware PWM and the practical frequency range and pulse resolution will be much lower that is achievable in hardware, but at least the jitter will be in the order of microseconds rather than milliseconds.

If you have a timer, you can set that up to kick an interrupt each time a new PWM edge is required. With some clever coding, you can queue these up so the interrupt handler knows which of many PWM channels and whether a high or low going edge is required, and then schedule itself for the next required edge.
If you have enough of these timers, then its even easier as you can allocate one per PWM channel.
On an embedded controller with a low-latency interrupt response, this can produce surprisingly good results.

I fail to understand why you would want to do PWM in software with all of the inherent timing jitter that interrupt servicing and software interactions will introduce (e.g. the PWM interrupt hits when interrupts are disabled, the processor is servicing a long uninterruptible instruction, or another service routine is active). Most modern microcontrollers (ARM-7, ARM Cortex-M, AVR32, MSP, ...) have timers that can either be configured to produce or are dedicated as PWM generators. These will produce multiple rock steady PWM signals that, once set up, require zero processor input to keep running. These PWM outputs can be configured so that two signals do not overlap or have simultaneous edges, as required by the application.
If you are relying on the OS sleep function to set the time between the PWM edges then this will run slow. The sleep function will set the minimum time between task activations and the time between these will be delayed by the task switches, the presence of a higher priority thread or other kernel function running.

Related

How to know whether an IRQ was served immediately on ARM Cortex M0+ (or any other MCU)

For my application (running on an STM32L082) I need accurate (relative) timestamping of a few types of interrupts. I do this by running a timer at 1 MHz and taking its count as soon as the ISR is run. They are all given the highest priority so they pre-empt less important interrupts. The problem I'm facing is that they may still be delayed by other interrupts at the same priority and by code that disables interrupts, and there seems to be no easy way to know this happened. It is no problem that the ISR was delayed, as long as I know that the particular timestamp is not accurate because of this.
My current approach is to let each ISR and each block of code with interrupts disabled check whether interrupts are pending using NVIC->ISPR[0] and flagging this for the pending ISR. Each ISR checks this flag and, if needed, flags the timestamp taken as not accurate.
Although this works, it feels like it's the wrong way around. So my question is: is there another way to know whether an IRQ was served immediately?
The IRQs in question are EXTI4-15 for a GPIO pin change and RTC for the wakeup timer. Unfortunately I'm not in the position to change the PCB layout and use TIM input capture on the input pin, nor to change the MCU used.
update
The fundamental limit to accuracy in the current setup is determined by the nature of the internal RTC calibration, which periodically adds/removes 32kHz ticks, leading to ~31 µs jitter. My goal is to eliminate (or at least detect) additional timestamping inaccuracies where possible. Having interrupts blocked incidentally for, say, 50+ µs is hard to avoid and influences measurements, hence the need to at least know when this occurs.
update 2
To clarify, I think this is a software question, asking if a particular feature exists and if so, how to use it. The answer I am looking for is one of: "yes it is possible, just check bit X of register Y", or "no it is not possible, but MCU ... does have such a feature, called ..." or "no, such a feature is generally not available on any platform (but the common workaround is ...)". This information will guide me (and future readers) towards a solution in software, and/or requirements for better hardware design.

In general
The ideal solution for accurate timestamping is to use timer capture hardware (built-in to the microcontroller, or an external implementation). Aside from that, using a CPU with enough priority levels to make your ISR always the highest priority could work, or you might be able to hack something together by making the DMA engine sample the GPIO pins (specifics below).
Some microcontrollers have connections between built-in peripherals that allow one peripheral to trigger another (like a GPIO pin triggering timer capture even though it isn't a dedicated timer capture input pin). Manufacturers have different names for this type of interconnection, but a general overview can be found on Wikipedia, along with a list of the various names. Exact capabilities vary by manufacturer.
I've never come across a feature in a microcontroller for indicating if an ISR was delayed by a higher priority ISR. I don't think it would be a commonly-used feature, because your ISR can be interrupted by a higher priority ISR at any moment, even after you check the hypothetical was_delayed flag. A higher priority ISR can often check if a lower priority interrupt is pending though.
For your specific situation
A possible approach is to use a timer and DMA (similar to audio streaming, double-buffered/circular modes are preferred) to continuously sample your GPIO pins to a buffer, and then you scan the buffer to determine when the pins changed. Note that this means the CPU must scan the buffer before it is overwritten again by DMA, which means the CPU can only sleep in short intervals and must keep the timer and DMA clocks running. ST's AN4666 is a relevant document, and has example code here (account required to download example code). They're using a different microcontroller, but they claim the approach can be adapted to others in their lineup.
Otherwise, with your current setup, I don't think there is a better solution than the one you're using (the flag that's set when you detect a delay). The ARM Cortex-M0+ NVIC does not have a feature to indicate if an ISR was delayed.
A refinement to your current approach might be making the ISRs as short as possible, so they only do the timestamp collection and then put any other work into a queue for processing by the main application at a lower priority (only applicable if the work is more complex than the enqueue operation, and if the work isn't time-sensitive). Eliminating or making the interrupts-disabled regions short should also help.

Calculating pulse width (duty cycle) using stm32 DMA. Is it possible?

I'm working on a project that a series of duty cycles must be measured. A sample of related waveforms is displayed below:
As you can see from the signal, the frequency is too high, and calculating it using bit functions is impossible. In controller's tech website here, they used the timer's input capture modes and rising-falling edges interrupts for calculating the difference between two captures of the timer.But this method is too slow and cannot fulfill our desires for high-frequency signals.
The other solution is to use DMA for fast transferring the capture data to the memory. But in STM32cubemx it is not possible to assigned two DMAs for two captuers of a timer as you can see below:
Could some one give me a suggestion for this issue?

Using the DMA channel is not likely to provide a good solution for fast signals since the memory buss is shared between the DMA controller and CPU, so predictable timing over the capture event time is not guaranteed. Also, the timing relationship between the DMA transfers and the external signal will be difficult to resolve. So I'd say "no" to your question.
With 16-bit timers that can run up to 120 MHz, the STM32 featured timers is your best choice. An 800kHz signal is not considered too fast for these critters! The trick is how you make use of the timers. You want to use the input capture mode. Capture several samples of logic high signal times, then average these numbers, do the same for logic low signal times, then add for total timer ticks, then multiply this by the timer tick period for the external signal period.

How to determine MCU Clock speed requirements

Overview:
I spent a while trying to think of how to formulate this question. To narrow the scope, I wanted to provide my initial HW requirements in the form of a ‘real life’ example application.
I understand that clock speed is probably relative, in the sense that it is a case by case basis. For example, your requirement for a certain speed may be impacted on by the on-chip peripherals offered by the MCU. As an example, you may spend (n) cycles servicing an ISR for an encoder, or, you could pick an MCU that has a QEI input to do it for you (to some degree), which in turn, may loosen your requirement?
I am not an expert, and am very much still learning, so please call me out if I use an incorrect term, or completely misinterpret something. I assure you; the feedback is welcome!
Example Application:
This application is relatively simple. It can be thought of as a non-blocking state machine, where each ‘iteration’ of the machine must complete within 20ms. A single iteration of this machine has 4 main tasks:
Decode a serial payload, consisting of 32 bytes. The length is fixed at 32 bytes, payload is dynamic, baud is 115200bps (See Task #2 below)
Read 4 incremental shaft encoder signals, which are coupled with 4 DC Motors, 1 encoder for each motor (See Task #1 Below)
Determine the position of 4 limit switches. ISR driven, trigger on rising edge for each switch.
Based on the 3 categories of inputs above, the MCU will output 4 separate PWM signals # 50Hz (20ms) to a motor controller for its next set of movements. (See Task #3 below)
From an IO perspective, I know that the MCU is on the hook for reading 8 digital signals (4 quadrature encoders, 4 limit switches), and decoding a serial frame of 32 bytes over UART.
Based on that data, the MCU will output 4 independent PWM signals, with a pulse width of [1000usec -3200usec], per motor, to the motor controller.
The Question:
After all is said and done, I am trying to think through how I can map my requirements into MCU selection, solely from a speed point of view.
It’s easy for me to look through the datasheet and say, this chip meets my requirements because it has (n) UARTS, (n) ISR input pins, (n) PWM outputs etc. But my projects are so small that I always assume the processor is ‘fast enough’. Aside from my immediate peripheral needs, I never really look into the actual MCU speed, which is an issue on my end.
To resolve that, I am trying to understand what goes into selecting a particular clock speed, based on the needs of a given application. Or, another way to say it, which is probably wrong, but how to you quantify the theoretical load on the processor for that specific application?
Additional Information
Task #1: Encoder:
Each of the 4 motors have different tasks within the system, but regardless, they are the same brand/model motor, and have a maximum RPM of 230. My assumption is, if at its worst case, one of the motors is spinning at 230 RPM, that would mean, at full quadrature resolution (count rising/falling for channel A/B) the 1000PPR encoder would generate 4K interrupts per revolution. As such, the MCU would have to service those interrupts, potentially creating a bottleneck for the system. For example, if (n) number of clock cycles are required to service the ISR, and for 1 revolution of 1 motor, we expect 4K interrupts, that would be … 230(RPM) * 4K (ISR per rev) == 920,000 interrupts per minute? Yikes! And then I guess you could just extrapolate and say, again, at it’s worst case, where each of the 4 motors are spinning at 230 RPM, there’s a potential that, if the encoders are full resolution, the system would have to endure 920K interrupts per minute for each encoder. So 920K * 4 motors == 3,680,000 interrupts per minute? I am 100% sure I am doing something wrong, so please, feel free to set me straight.
Task #2: Serial Decoding
The MCU will require a dedicated HW serial port to decode a packet of 32 bytes, which repeats, with different values, every 7ms. Baud rate will be set to 115200bps.
Task #3: PWM Output
Based on the information from tasks 1 and 2, the MCU will write to 4 separate PWM outputs. The pulse for each output will be between 1000-3200usec with a frequency of 50Hz.

You need to separate real-time critical parts from the rest of the application. For example, the actual reception of an UART frame is somewhat time-critical if you do so interrupt-based. But the protocol decoding is not critical at all unless you are expected to respond within a certain time.
Decode a serial payload, consisting of 32 bytes.
You can either do this the old school way with interrupts filling up a buffer, or you could look for a part with DMA, which is fairly common nowadays. DMA means that you won't have to consider some annoying, relatively low frequency UART interrupt disrupting other tasks.
Read 4 incremental shaft encoder signals
I haven't worked with such encoders so I can't tell how time-critical they are. If you have to catch every single interrupt and your calculations are correct, then 3,680,000 interrupts per minute is still not that bad. 60*60/3680000 = 978us. So roughly one interrupt every millisecond, that's not a "hard real-time" requirement. If that's the only time-critical thing you need to do, then any shabby 8-bitter running at 8MHz could keep up.
Determine the position of 4 limit switches
You don't mention timing here but I assume this is something that could be polled cyclically by a low priority cyclic timer.
the MCU will output 4 separate PWM signals
Not a problem, just pick one with a decent PWM hardware peripheral. You should just need to update some PWM duty cycle registers now and then.
Overall, this doesn't sound all that real-time critical. I've done much worse real-time projects with icky 8 and 16 bitters. However, each time I did, I always regret not picking a faster MCU, because you always come up with stuff to add as the project/product goes on.
It sounds like your average mainstream Cortex M0+ would be a good candidate for this project. Clock it at ~48MHz and you'll have plenty of CPU power. Cortex M4 or larger if you actually expect floating point math (I don't quite see why you'd need that though).
Given the current component crisis, be careful with which brand you pick though! In particular stay clear of STM32, since ST can't produce them right now and you might end up waiting over a year until you get parts.

The answer to the question is "experience". But intuitively your example is not particularly taxing - although there are plenty of ways you could mess it up. I once worked on a project that ran on a 200MHz C5502 DSP at near 100% CPU load. The application now runs on a 72MHz Cortex-M3 at only 60% with additional functionality and I/O not present in the original implementation..
Your application is I/O bound; depending on data rates (and critically interrupt rates), I/O seldom constitutes the highest CPU load, and DMA, hardware FIFOs, input capture timer/counters, and hardware PWM etc. can be used to minimise the I/O impact. I shan't go into it in detail; #Lundin has already done that.
Note also that raw processor speed is important for data or signal processing and number crunching - but what I/O generally requires is deterministic real-time response, and that is seldom simply a matter of MHz or MIPS - you will get more deterministic and possibly faster response from an 8bit AVR running at a few MHz than you can guarantee from a 500MHz application processor running Linux - and it won't take 30 seconds to boot!

Which MCU(Cortex-M) for time critical GPIO application?

We have an application which runs on PIC24H, we would like to port it to another MCU, preferably ARM Cortex. Application is extremely time critical, meaning that we need extremely deterministic code behaviour. In short, there are pulses which are obtained via special hardware to GPIO pins, data is analyzed right away. Processing of data is not complex(we don't need a beefy cpu/mcu to do it). After analyzing the data GPIO output pins are written to their values.
App in 3 short lines:
process input pins
determine pattern within processing of input pins
based on the received pattern write output pins
PIC24H is working at 40MHz, we can toggle the pin in 25ns, we would be grateful with at least 2x speed for future upgrades. So MCU which can run deterministic code and toggle pins with at least 80MHz (12.5ns) would be just fine. We don't need toggling of the pins at constant fast rate, we need a mcu which can toggle it in less than 25ns. We can't waste cycles while toggling, if one cycle is off we loose synchronization. Everything must be done in one cycle precision(or two but constant two cycles), so code should be 100% deterministic.
Please let me know if I'm missing something or if what we need can be done using some other methods on Cortex-M. Just keep in mind that if one cycle is lost(due cache or similar) we loose signal sync and app will not do it's work right or at all.
Thanks!
Br

According to this blog post, the interrupt latency for Cortex-M ranges from 12 to 16 cycles (assuming you are not using FPU registers) with best-case memories. M0 and M0+ are slower than M3/M4/M7. On top of this, you need to add the GPIO access times (and watch out for different clock frequencies between the core and the peripherals. Cortex-M7 will suppport higher clock speeds than M3/M4.
It still isn't clear how many cycles are consumed in recognising a pattern, and how an interrupt is useful in doing this - generally a low latency interface function like this would be an obvious target for dedicated hardware, but since you have an existing software solution it seems the problem is mis-specified.
Providing you avoid accessing any 'slow' peripherals which might stall the bus, the interrupt latency should be deterministic - any specific device should have documentation which covers this.
NXP have an application note which describes some of the detail of how to measure what is going on.

Beginner - While() - Optimization

I am new in embedded development and few times ago I red some code about a PIC24xxxx.
void i2c_Write(char data) {
while (I2C2STATbits.TBF) {};
IFS3bits.MI2C2IF = 0;
I2C2TRN = data;
while (I2C2STATbits.TRSTAT) {};
Nop();
Nop();
}
What do you think about the while condition? Does the microchip not using a lot of CPU for that?
I asked myself this question and surprisingly saw a lot of similar code in internet.
Is there not a better way to do it?
What about the Nop() too, why two of them?

Generally, in order to interact with hardware, there are 2 ways:
Busy wait
Interrupt base
In your case, in order to interact with the I2C device, your software is waiting first that the TBF bit is cleared which means the I2C device is ready to accept a byte to send.
Then your software is actually writing the byte into the device and waits that the TRSTAT bit is cleared, meaning that the data has been correctly processed by your I2C device.
The code your are showing is written with busy wait loops, meaning that the CPU is actively waiting the HW. This is indeed waste of resources, but in some case (e.g. your I2C interrupt line is not connected or not available) this is the only way to do.
If you would use interrupt, you would ask the hardware to tell you whenever a given event is happening. For instance, TBF bit is cleared, etc...
The advantage of that is that, while the HW is doing its stuff, you can continue doing other. Or just sleep to save battery.
I'm not an expert in I2C so the interrupt event I have described is most likely not accurate, but that gives you an idea why you get 2 while loop.
Now regarding pro and cons of interrupt base implementation and busy wait implementation I would say that interrupt based implementation is more efficient but more difficult to write since you have to process asynchronous event coming from HW. Busy wait implementation is easy to write but is slower; But this might still be fast enough for you.
Eventually, I got no idea why the 2 NoP are needed there. Most likely a tweak which is needed because somehow, the CPU would still go too fast.

when doing these kinds of transactions (i2c/spi) you find yourself in one of two situations, bit bang, or some form of hardware assist. bit bang is easier to implement and read and debug, and is often quite portable from one chip/family to the next. But burns a lot of cpu. But microcontrollers are mostly there to be custom hardware like a cpld or fpga that is easier to program. They are there to burn cpu cycles pretending to be hardware designs. with i2c or spi you are trying to create a specific waveform on some number of I/O pins on the device and at times latching the inputs. The bus has a spec and sometimes is slower than your cpu. Sometimes not, sometimes when you add the software and compiler overhead you might end up not needing a timer for delays you might be just slow enough. But ideally you look at the waveform and you simply create it, raise pin X delay n ms, raise pin Y delay n ms, drop pin Y delay 2*n ms, and so on. Those delays can come from tuned loops (count from 0 to 1341) or polling a timer until it gets to Z number of ticks of some clock. Massive cpu waste, but the point is you are really just being programmable hardware and hardware would be burning time waiting as well.
When you have a peripheral in your mcu that assists it might do much/most of the timing for you but maybe not all of it, perhaps you have to assert/deassert chip select and then the spi logic does the clock and data timing in and out for you. And these peripherals are generally very specific to one family of one chip vendor perhaps common across a chip vendor but never vendor to vendor so very not portable and there is a learning curve. And perhaps in your case if the cpu is fast enough it might be possible for you to do the next thing in a way that it violates the bus timing, so you would have to kill more time (maybe why you have those Nops()).
Think of an mcu as a software programmable CPLD or FPGA and this waste makes a lot more sense. Unfortunately unlike a CPLD or FPGA you are single threaded so you cant be doing several trivial things in parallel with clock accurate timing (exactly this many clocks task a switches state and changes output). Interrupts help but not quite the same, change one line of code and your timing changes.
In this case, esp with the nops, you should probably be using a scope anyway to see the i2c bus and since/when you have it on the scope you can try with and without those calls to see how it affects the waveform. It could also be a case of a bug in the peripheral or a feature maybe you cant hit some register too fast otherwise the peripheral breaks. or it could be a bug in a chip from 5 years ago and the code was written for that the bug is long gone, but they just kept re-using the code, you will see that a lot in vendor libraries.

What do you think about the while condition? Does the microchip not using a lot of CPU for that?
No, since the transmit buffer won't stay full for very long.
I asked myself this question and surprisingly saw a lot of similar code in internet.
What would you suggest instead?
Is there not a better way to do it? (I hate crazy loops :D)
Not that I, you, or apparently anyone else knows of. In what way do you think it could be any better? The transmit buffer won't stay full long enough to make it useful to retask the CPU.
What about the Nop() too, why two of them?
The Nop's ensure that the signal remains stable long enough. This makes this code safe to call under all conditions. Without it, it would only be safe to call this code if you didn't mess with the i2c bus immediately after calling it. But in most cases, this code would be called in a loop anyway, so it makes much more sense to make it inherently safe.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight