Is it possible to implement one wire protocol in SIM800 through bit banging? Time required to change the direction of pin (as input or output) is 1.5 microsecond and Time required to change the state of pin (as low or high) is 1.5 microsecond.
The Dallas/Maxim 1-Wire(tm) protocol is self-clocking; it is deliberately designed to make it easy to implement in this way. Using a hardware timer would be a good idea - removing a good deal of software overhead, but even a low-precision RC oscillator is likely to be accurate enough.
1-Wire(tm) is self-clocking so I assume the timings you suggest are minimum timings, not required timing; the protocol has wide tolerances for the actual bit timing, and the inter-bit timing merely requires that the line is high for >1us, but may be any length. It only needs to be long enough to be able to detect a definite edge - on an input-capture timer or edge triggered interrupt for example - you could software poll the line, but if your application needs to get other work done at the same time, a 1us pulse may get missed.
It is not clear to me what the definition of the timings you suggest are, but if they simply refer to the duration of the edge, the 1.5us you suggest is not a software issue - that is down to the slew-rate of the I/O pin which is largely a function of the line characteristics. For short distance communication, you'd have to really mess up the hardware design to get switching that slow.
Related
For my application (running on an STM32L082) I need accurate (relative) timestamping of a few types of interrupts. I do this by running a timer at 1 MHz and taking its count as soon as the ISR is run. They are all given the highest priority so they pre-empt less important interrupts. The problem I'm facing is that they may still be delayed by other interrupts at the same priority and by code that disables interrupts, and there seems to be no easy way to know this happened. It is no problem that the ISR was delayed, as long as I know that the particular timestamp is not accurate because of this.
My current approach is to let each ISR and each block of code with interrupts disabled check whether interrupts are pending using NVIC->ISPR[0] and flagging this for the pending ISR. Each ISR checks this flag and, if needed, flags the timestamp taken as not accurate.
Although this works, it feels like it's the wrong way around. So my question is: is there another way to know whether an IRQ was served immediately?
The IRQs in question are EXTI4-15 for a GPIO pin change and RTC for the wakeup timer. Unfortunately I'm not in the position to change the PCB layout and use TIM input capture on the input pin, nor to change the MCU used.
update
The fundamental limit to accuracy in the current setup is determined by the nature of the internal RTC calibration, which periodically adds/removes 32kHz ticks, leading to ~31 µs jitter. My goal is to eliminate (or at least detect) additional timestamping inaccuracies where possible. Having interrupts blocked incidentally for, say, 50+ µs is hard to avoid and influences measurements, hence the need to at least know when this occurs.
update 2
To clarify, I think this is a software question, asking if a particular feature exists and if so, how to use it. The answer I am looking for is one of: "yes it is possible, just check bit X of register Y", or "no it is not possible, but MCU ... does have such a feature, called ..." or "no, such a feature is generally not available on any platform (but the common workaround is ...)". This information will guide me (and future readers) towards a solution in software, and/or requirements for better hardware design.
In general
The ideal solution for accurate timestamping is to use timer capture hardware (built-in to the microcontroller, or an external implementation). Aside from that, using a CPU with enough priority levels to make your ISR always the highest priority could work, or you might be able to hack something together by making the DMA engine sample the GPIO pins (specifics below).
Some microcontrollers have connections between built-in peripherals that allow one peripheral to trigger another (like a GPIO pin triggering timer capture even though it isn't a dedicated timer capture input pin). Manufacturers have different names for this type of interconnection, but a general overview can be found on Wikipedia, along with a list of the various names. Exact capabilities vary by manufacturer.
I've never come across a feature in a microcontroller for indicating if an ISR was delayed by a higher priority ISR. I don't think it would be a commonly-used feature, because your ISR can be interrupted by a higher priority ISR at any moment, even after you check the hypothetical was_delayed flag. A higher priority ISR can often check if a lower priority interrupt is pending though.
For your specific situation
A possible approach is to use a timer and DMA (similar to audio streaming, double-buffered/circular modes are preferred) to continuously sample your GPIO pins to a buffer, and then you scan the buffer to determine when the pins changed. Note that this means the CPU must scan the buffer before it is overwritten again by DMA, which means the CPU can only sleep in short intervals and must keep the timer and DMA clocks running. ST's AN4666 is a relevant document, and has example code here (account required to download example code). They're using a different microcontroller, but they claim the approach can be adapted to others in their lineup.
Otherwise, with your current setup, I don't think there is a better solution than the one you're using (the flag that's set when you detect a delay). The ARM Cortex-M0+ NVIC does not have a feature to indicate if an ISR was delayed.
A refinement to your current approach might be making the ISRs as short as possible, so they only do the timestamp collection and then put any other work into a queue for processing by the main application at a lower priority (only applicable if the work is more complex than the enqueue operation, and if the work isn't time-sensitive). Eliminating or making the interrupts-disabled regions short should also help.
Overview:
I spent a while trying to think of how to formulate this question. To narrow the scope, I wanted to provide my initial HW requirements in the form of a ‘real life’ example application.
I understand that clock speed is probably relative, in the sense that it is a case by case basis. For example, your requirement for a certain speed may be impacted on by the on-chip peripherals offered by the MCU. As an example, you may spend (n) cycles servicing an ISR for an encoder, or, you could pick an MCU that has a QEI input to do it for you (to some degree), which in turn, may loosen your requirement?
I am not an expert, and am very much still learning, so please call me out if I use an incorrect term, or completely misinterpret something. I assure you; the feedback is welcome!
Example Application:
This application is relatively simple. It can be thought of as a non-blocking state machine, where each ‘iteration’ of the machine must complete within 20ms. A single iteration of this machine has 4 main tasks:
Decode a serial payload, consisting of 32 bytes. The length is fixed at 32 bytes, payload is dynamic, baud is 115200bps (See Task #2 below)
Read 4 incremental shaft encoder signals, which are coupled with 4 DC Motors, 1 encoder for each motor (See Task #1 Below)
Determine the position of 4 limit switches. ISR driven, trigger on rising edge for each switch.
Based on the 3 categories of inputs above, the MCU will output 4 separate PWM signals # 50Hz (20ms) to a motor controller for its next set of movements. (See Task #3 below)
From an IO perspective, I know that the MCU is on the hook for reading 8 digital signals (4 quadrature encoders, 4 limit switches), and decoding a serial frame of 32 bytes over UART.
Based on that data, the MCU will output 4 independent PWM signals, with a pulse width of [1000usec -3200usec], per motor, to the motor controller.
The Question:
After all is said and done, I am trying to think through how I can map my requirements into MCU selection, solely from a speed point of view.
It’s easy for me to look through the datasheet and say, this chip meets my requirements because it has (n) UARTS, (n) ISR input pins, (n) PWM outputs etc. But my projects are so small that I always assume the processor is ‘fast enough’. Aside from my immediate peripheral needs, I never really look into the actual MCU speed, which is an issue on my end.
To resolve that, I am trying to understand what goes into selecting a particular clock speed, based on the needs of a given application. Or, another way to say it, which is probably wrong, but how to you quantify the theoretical load on the processor for that specific application?
Additional Information
Task #1: Encoder:
Each of the 4 motors have different tasks within the system, but regardless, they are the same brand/model motor, and have a maximum RPM of 230. My assumption is, if at its worst case, one of the motors is spinning at 230 RPM, that would mean, at full quadrature resolution (count rising/falling for channel A/B) the 1000PPR encoder would generate 4K interrupts per revolution. As such, the MCU would have to service those interrupts, potentially creating a bottleneck for the system. For example, if (n) number of clock cycles are required to service the ISR, and for 1 revolution of 1 motor, we expect 4K interrupts, that would be … 230(RPM) * 4K (ISR per rev) == 920,000 interrupts per minute? Yikes! And then I guess you could just extrapolate and say, again, at it’s worst case, where each of the 4 motors are spinning at 230 RPM, there’s a potential that, if the encoders are full resolution, the system would have to endure 920K interrupts per minute for each encoder. So 920K * 4 motors == 3,680,000 interrupts per minute? I am 100% sure I am doing something wrong, so please, feel free to set me straight.
Task #2: Serial Decoding
The MCU will require a dedicated HW serial port to decode a packet of 32 bytes, which repeats, with different values, every 7ms. Baud rate will be set to 115200bps.
Task #3: PWM Output
Based on the information from tasks 1 and 2, the MCU will write to 4 separate PWM outputs. The pulse for each output will be between 1000-3200usec with a frequency of 50Hz.
You need to separate real-time critical parts from the rest of the application. For example, the actual reception of an UART frame is somewhat time-critical if you do so interrupt-based. But the protocol decoding is not critical at all unless you are expected to respond within a certain time.
Decode a serial payload, consisting of 32 bytes.
You can either do this the old school way with interrupts filling up a buffer, or you could look for a part with DMA, which is fairly common nowadays. DMA means that you won't have to consider some annoying, relatively low frequency UART interrupt disrupting other tasks.
Read 4 incremental shaft encoder signals
I haven't worked with such encoders so I can't tell how time-critical they are. If you have to catch every single interrupt and your calculations are correct, then 3,680,000 interrupts per minute is still not that bad. 60*60/3680000 = 978us. So roughly one interrupt every millisecond, that's not a "hard real-time" requirement. If that's the only time-critical thing you need to do, then any shabby 8-bitter running at 8MHz could keep up.
Determine the position of 4 limit switches
You don't mention timing here but I assume this is something that could be polled cyclically by a low priority cyclic timer.
the MCU will output 4 separate PWM signals
Not a problem, just pick one with a decent PWM hardware peripheral. You should just need to update some PWM duty cycle registers now and then.
Overall, this doesn't sound all that real-time critical. I've done much worse real-time projects with icky 8 and 16 bitters. However, each time I did, I always regret not picking a faster MCU, because you always come up with stuff to add as the project/product goes on.
It sounds like your average mainstream Cortex M0+ would be a good candidate for this project. Clock it at ~48MHz and you'll have plenty of CPU power. Cortex M4 or larger if you actually expect floating point math (I don't quite see why you'd need that though).
Given the current component crisis, be careful with which brand you pick though! In particular stay clear of STM32, since ST can't produce them right now and you might end up waiting over a year until you get parts.
The answer to the question is "experience". But intuitively your example is not particularly taxing - although there are plenty of ways you could mess it up. I once worked on a project that ran on a 200MHz C5502 DSP at near 100% CPU load. The application now runs on a 72MHz Cortex-M3 at only 60% with additional functionality and I/O not present in the original implementation..
Your application is I/O bound; depending on data rates (and critically interrupt rates), I/O seldom constitutes the highest CPU load, and DMA, hardware FIFOs, input capture timer/counters, and hardware PWM etc. can be used to minimise the I/O impact. I shan't go into it in detail; #Lundin has already done that.
Note also that raw processor speed is important for data or signal processing and number crunching - but what I/O generally requires is deterministic real-time response, and that is seldom simply a matter of MHz or MIPS - you will get more deterministic and possibly faster response from an 8bit AVR running at a few MHz than you can guarantee from a 500MHz application processor running Linux - and it won't take 30 seconds to boot!
I am designing the controller and data acquisition unit for a rocket engine test stand. This system needs to control a number of actuators on the test stand and also be able to transmit collected data back to the host computer where the team will be watching live data/camera feeds from safety.
The overall design requirements are as follows:
Acquire data from ~15 analog sensors at 1KHz
Control the actuators on the test stand including valves and ignition switches
Transmit data back to the host computer in our shelter in real time
Accept control from the host computer for things like manual valve actuation, test sequence modification, sequence abortion, etc.
I am not exactly sure where to begin when laying out the software for this system. I am considering using an STM32 ARM Cortex-M4 processor running at 180 MHz. I am having trouble figuring how I should approach the problem. I have considered using an RTOS system but based on what I have seen those generate large overheads as you run them faster as the scheduler has to run each tick. The other idea I'm bouncing around is a state machine combined with some timer-based interrupts for reading and then sending data back out to the PC. Any advice as to how to approach this problem to minimize code complexity would be greatly appreciated. Thanks.
EDIT:
I have been told to clarify a number of things concerning the technical specs of the system.
My actuators consist of:
6 solenoids (controlled digitally through relays/MOSFET, and switched around once a second)
2 DC motors (driven with PWM outputs in a PID loop, need to be able to ramp position controllably)
One igniter, again controlled through a relay/MOSFET
My sensors consist of:
8 pressure transducers (analog voltages)
4 thermocouples (analog voltages)
2 motor encoders (quadrature encoders)
1 light sensor (analog voltage)
1 Load cell (analog voltage)
Ideally all of the collected data (all of the above sensors) plus some additional data (timestamps, motor set positions, solenoid positions) is streamed back to the host computer at in real time.
Given the motor control with PWM & PID, you need to specify a desired resolution, either in PWM timer ticks or ADC reads. This is the most critical part. It doesn't hurt if the ADC has greater resolution than your specified resolution either. The PCB has to be designed accordingly, with sufficient resolution on resistors etc.
After you've done this, find MCU with sufficiently accurate ADC. I would imagine that 12 bit resolution is enough for most applications, but I don't know your specific case.
Next, you need to decide how fast you want the PID to be. Should an output on the PWM result in a read on the ADC in the next cycle, or could you settle for slower response? The realtime bottleneck here will be the ADC conversion clock, not the CPU.
The rest of the system doesn't seem time critical at all - you just have to ensure that everything is read/set synchronously. The data transmission to/from the host should preferably be done over CAN since it comes with hard real-time characteristics. Doesn't seem that you need a whole lot of bandwidth.
I have designed systems very similar to this using bare metal 16 bit MCUs running on 16MHz. Processing speed is really not a big concern, but meeting real-time deadlines is. That means you can forget about using Linux toys like Rasp PI, it's completely out of the question. And a RTOS is likely overkill since it mostly adds additional complexity.
A bare metal Cortex M with sufficient ADC resolution and CAN seems like a good choice. If you can stay away from floating point, that's nice too - depends on how advanced math you need. If you need nothing more advanced than PID, it can be implemented with fixed point just fine. (Or PI rather, since that usually works best for fast motor control systems.)
We have an application which runs on PIC24H, we would like to port it to another MCU, preferably ARM Cortex. Application is extremely time critical, meaning that we need extremely deterministic code behaviour. In short, there are pulses which are obtained via special hardware to GPIO pins, data is analyzed right away. Processing of data is not complex(we don't need a beefy cpu/mcu to do it). After analyzing the data GPIO output pins are written to their values.
App in 3 short lines:
process input pins
determine pattern within processing of input pins
based on the received pattern write output pins
PIC24H is working at 40MHz, we can toggle the pin in 25ns, we would be grateful with at least 2x speed for future upgrades. So MCU which can run deterministic code and toggle pins with at least 80MHz (12.5ns) would be just fine. We don't need toggling of the pins at constant fast rate, we need a mcu which can toggle it in less than 25ns. We can't waste cycles while toggling, if one cycle is off we loose synchronization. Everything must be done in one cycle precision(or two but constant two cycles), so code should be 100% deterministic.
Please let me know if I'm missing something or if what we need can be done using some other methods on Cortex-M. Just keep in mind that if one cycle is lost(due cache or similar) we loose signal sync and app will not do it's work right or at all.
Thanks!
Br
According to this blog post, the interrupt latency for Cortex-M ranges from 12 to 16 cycles (assuming you are not using FPU registers) with best-case memories. M0 and M0+ are slower than M3/M4/M7. On top of this, you need to add the GPIO access times (and watch out for different clock frequencies between the core and the peripherals. Cortex-M7 will suppport higher clock speeds than M3/M4.
It still isn't clear how many cycles are consumed in recognising a pattern, and how an interrupt is useful in doing this - generally a low latency interface function like this would be an obvious target for dedicated hardware, but since you have an existing software solution it seems the problem is mis-specified.
Providing you avoid accessing any 'slow' peripherals which might stall the bus, the interrupt latency should be deterministic - any specific device should have documentation which covers this.
NXP have an application note which describes some of the detail of how to measure what is going on.
I am new in embedded development and few times ago I red some code about a PIC24xxxx.
void i2c_Write(char data) {
while (I2C2STATbits.TBF) {};
IFS3bits.MI2C2IF = 0;
I2C2TRN = data;
while (I2C2STATbits.TRSTAT) {};
Nop();
Nop();
}
What do you think about the while condition? Does the microchip not using a lot of CPU for that?
I asked myself this question and surprisingly saw a lot of similar code in internet.
Is there not a better way to do it?
What about the Nop() too, why two of them?
Generally, in order to interact with hardware, there are 2 ways:
Busy wait
Interrupt base
In your case, in order to interact with the I2C device, your software is waiting first that the TBF bit is cleared which means the I2C device is ready to accept a byte to send.
Then your software is actually writing the byte into the device and waits that the TRSTAT bit is cleared, meaning that the data has been correctly processed by your I2C device.
The code your are showing is written with busy wait loops, meaning that the CPU is actively waiting the HW. This is indeed waste of resources, but in some case (e.g. your I2C interrupt line is not connected or not available) this is the only way to do.
If you would use interrupt, you would ask the hardware to tell you whenever a given event is happening. For instance, TBF bit is cleared, etc...
The advantage of that is that, while the HW is doing its stuff, you can continue doing other. Or just sleep to save battery.
I'm not an expert in I2C so the interrupt event I have described is most likely not accurate, but that gives you an idea why you get 2 while loop.
Now regarding pro and cons of interrupt base implementation and busy wait implementation I would say that interrupt based implementation is more efficient but more difficult to write since you have to process asynchronous event coming from HW. Busy wait implementation is easy to write but is slower; But this might still be fast enough for you.
Eventually, I got no idea why the 2 NoP are needed there. Most likely a tweak which is needed because somehow, the CPU would still go too fast.
when doing these kinds of transactions (i2c/spi) you find yourself in one of two situations, bit bang, or some form of hardware assist. bit bang is easier to implement and read and debug, and is often quite portable from one chip/family to the next. But burns a lot of cpu. But microcontrollers are mostly there to be custom hardware like a cpld or fpga that is easier to program. They are there to burn cpu cycles pretending to be hardware designs. with i2c or spi you are trying to create a specific waveform on some number of I/O pins on the device and at times latching the inputs. The bus has a spec and sometimes is slower than your cpu. Sometimes not, sometimes when you add the software and compiler overhead you might end up not needing a timer for delays you might be just slow enough. But ideally you look at the waveform and you simply create it, raise pin X delay n ms, raise pin Y delay n ms, drop pin Y delay 2*n ms, and so on. Those delays can come from tuned loops (count from 0 to 1341) or polling a timer until it gets to Z number of ticks of some clock. Massive cpu waste, but the point is you are really just being programmable hardware and hardware would be burning time waiting as well.
When you have a peripheral in your mcu that assists it might do much/most of the timing for you but maybe not all of it, perhaps you have to assert/deassert chip select and then the spi logic does the clock and data timing in and out for you. And these peripherals are generally very specific to one family of one chip vendor perhaps common across a chip vendor but never vendor to vendor so very not portable and there is a learning curve. And perhaps in your case if the cpu is fast enough it might be possible for you to do the next thing in a way that it violates the bus timing, so you would have to kill more time (maybe why you have those Nops()).
Think of an mcu as a software programmable CPLD or FPGA and this waste makes a lot more sense. Unfortunately unlike a CPLD or FPGA you are single threaded so you cant be doing several trivial things in parallel with clock accurate timing (exactly this many clocks task a switches state and changes output). Interrupts help but not quite the same, change one line of code and your timing changes.
In this case, esp with the nops, you should probably be using a scope anyway to see the i2c bus and since/when you have it on the scope you can try with and without those calls to see how it affects the waveform. It could also be a case of a bug in the peripheral or a feature maybe you cant hit some register too fast otherwise the peripheral breaks. or it could be a bug in a chip from 5 years ago and the code was written for that the bug is long gone, but they just kept re-using the code, you will see that a lot in vendor libraries.
What do you think about the while condition? Does the microchip not using a lot of CPU for that?
No, since the transmit buffer won't stay full for very long.
I asked myself this question and surprisingly saw a lot of similar code in internet.
What would you suggest instead?
Is there not a better way to do it? (I hate crazy loops :D)
Not that I, you, or apparently anyone else knows of. In what way do you think it could be any better? The transmit buffer won't stay full long enough to make it useful to retask the CPU.
What about the Nop() too, why two of them?
The Nop's ensure that the signal remains stable long enough. This makes this code safe to call under all conditions. Without it, it would only be safe to call this code if you didn't mess with the i2c bus immediately after calling it. But in most cases, this code would be called in a loop anyway, so it makes much more sense to make it inherently safe.