I'm working on a project that requires the use of a microcontroller, and for this reason, I decided to use the Beaglebone Black. I'm still new to the Beaglebone world and I'm facing some problems that I hope you guys can help me with.
In my project I will have to continuously read from all the 7 analog read pins and do some processing accordingly. My question is, what will be the fastest programming language to do so (I must read as much samples as possible and in a very short time!) and how to increase the sampling rate from KHz to MHz?
I tried the following codes:
Javascript Code:
var b = require('bonescript');//this variable is to refer to my beaglebone
time = new Date();
b.analogRead("P9_39");
console.log(new Date() - time);
this code will simply perform one analog read and will print out the time needed to perform the read. Surprisingly, the result was 111ms!! which means that my sampling rate is 10 if I'm not wrong.
An alternative was to use pyhton:
import Adafruit_BBIO.ADC as ADC
import time
ADC.setup()
millis = int(round(time.time() * 1000))
ADC.read_raw("P9_39")
millis = millis = int(round(time.time() * 1000)) - millis
print millis
this code took less time (4ms) but still, if I wanted to read form the 7 analog input pins, I will only be able to read around 35 samples from each.
Using the terminal:
echo cape-bone-iio > /sys/devices/bone_capemgr.*/slots
time cat /sys/devices/ocp.3/helper.15/AIN0
############OR############
time cat /sys/devices/ocp.3/44e0d000.tscadc/tiadc/iio\:device0/in_voltage0_raw
and this took 50ms.
I want my sampling rate to be something in MHz. How can I do so? I know that the Beaglebone Black is capable of that but I could not find a clear way to do so. Any help is appreciated.
Thanks in advance.
Sampling rate of AM335x ADC is 200K (link). This means you won't get into MHz range with stock BeagleBone Black ADC.
To get something working with a latency of 5 µs in non-real-time OS like Linux is impossible. You will be at a mercy of OS to schedule your execution thread. Other kernel threads will take priority and will preempt your thread, even if you assign it the highest scheduling priority.
From my experience with digital IO on BeagleBone Black, I stated seeing missed frames starting around 1K samples per second. Now, it will depend on your level of tolerance to missing samples -- if you only need working semi-reliably you can probably squeeze out 10 K samples per second by switching to C/C++ and increasing priority of your process with nice --10 ... command. However if you cannot tolerate missed frames, you have to do one of these:
Bypass OS entirely and write C program for naked AM335x processor (no OS).
Use another hardware -- an ADC with a buffer to accumulate samples while your program is preempted.
Use PRUSS processors on BBB. They run at 200 MHz, so if you have a tight loop with e.g. 20 assembly instructions you will get reliable sampling rate of 10 MHz. That is if you had a faster ADC in the first place, and of course it would handle the stock 200 KHz ADC easily.
I personally went with option #3 and was happy to see my device perform sub-millisecond GPIO operations extremely reliably.
Use 127 beaglebone blacks plugged into 127 usb hub ports and breakout visual basic and write a usb program to automatically sequencially fire 127 beagle bones 1 after the other and read the data in a textbox...You will get around 16 mhz / msps consective adcs per fast cpu with say windows 10....lyj2021
You may have over lapping data...But you can track this with each fire of each beagle bone black...consecutively...
Related
Overview:
I spent a while trying to think of how to formulate this question. To narrow the scope, I wanted to provide my initial HW requirements in the form of a ‘real life’ example application.
I understand that clock speed is probably relative, in the sense that it is a case by case basis. For example, your requirement for a certain speed may be impacted on by the on-chip peripherals offered by the MCU. As an example, you may spend (n) cycles servicing an ISR for an encoder, or, you could pick an MCU that has a QEI input to do it for you (to some degree), which in turn, may loosen your requirement?
I am not an expert, and am very much still learning, so please call me out if I use an incorrect term, or completely misinterpret something. I assure you; the feedback is welcome!
Example Application:
This application is relatively simple. It can be thought of as a non-blocking state machine, where each ‘iteration’ of the machine must complete within 20ms. A single iteration of this machine has 4 main tasks:
Decode a serial payload, consisting of 32 bytes. The length is fixed at 32 bytes, payload is dynamic, baud is 115200bps (See Task #2 below)
Read 4 incremental shaft encoder signals, which are coupled with 4 DC Motors, 1 encoder for each motor (See Task #1 Below)
Determine the position of 4 limit switches. ISR driven, trigger on rising edge for each switch.
Based on the 3 categories of inputs above, the MCU will output 4 separate PWM signals # 50Hz (20ms) to a motor controller for its next set of movements. (See Task #3 below)
From an IO perspective, I know that the MCU is on the hook for reading 8 digital signals (4 quadrature encoders, 4 limit switches), and decoding a serial frame of 32 bytes over UART.
Based on that data, the MCU will output 4 independent PWM signals, with a pulse width of [1000usec -3200usec], per motor, to the motor controller.
The Question:
After all is said and done, I am trying to think through how I can map my requirements into MCU selection, solely from a speed point of view.
It’s easy for me to look through the datasheet and say, this chip meets my requirements because it has (n) UARTS, (n) ISR input pins, (n) PWM outputs etc. But my projects are so small that I always assume the processor is ‘fast enough’. Aside from my immediate peripheral needs, I never really look into the actual MCU speed, which is an issue on my end.
To resolve that, I am trying to understand what goes into selecting a particular clock speed, based on the needs of a given application. Or, another way to say it, which is probably wrong, but how to you quantify the theoretical load on the processor for that specific application?
Additional Information
Task #1: Encoder:
Each of the 4 motors have different tasks within the system, but regardless, they are the same brand/model motor, and have a maximum RPM of 230. My assumption is, if at its worst case, one of the motors is spinning at 230 RPM, that would mean, at full quadrature resolution (count rising/falling for channel A/B) the 1000PPR encoder would generate 4K interrupts per revolution. As such, the MCU would have to service those interrupts, potentially creating a bottleneck for the system. For example, if (n) number of clock cycles are required to service the ISR, and for 1 revolution of 1 motor, we expect 4K interrupts, that would be … 230(RPM) * 4K (ISR per rev) == 920,000 interrupts per minute? Yikes! And then I guess you could just extrapolate and say, again, at it’s worst case, where each of the 4 motors are spinning at 230 RPM, there’s a potential that, if the encoders are full resolution, the system would have to endure 920K interrupts per minute for each encoder. So 920K * 4 motors == 3,680,000 interrupts per minute? I am 100% sure I am doing something wrong, so please, feel free to set me straight.
Task #2: Serial Decoding
The MCU will require a dedicated HW serial port to decode a packet of 32 bytes, which repeats, with different values, every 7ms. Baud rate will be set to 115200bps.
Task #3: PWM Output
Based on the information from tasks 1 and 2, the MCU will write to 4 separate PWM outputs. The pulse for each output will be between 1000-3200usec with a frequency of 50Hz.
You need to separate real-time critical parts from the rest of the application. For example, the actual reception of an UART frame is somewhat time-critical if you do so interrupt-based. But the protocol decoding is not critical at all unless you are expected to respond within a certain time.
Decode a serial payload, consisting of 32 bytes.
You can either do this the old school way with interrupts filling up a buffer, or you could look for a part with DMA, which is fairly common nowadays. DMA means that you won't have to consider some annoying, relatively low frequency UART interrupt disrupting other tasks.
Read 4 incremental shaft encoder signals
I haven't worked with such encoders so I can't tell how time-critical they are. If you have to catch every single interrupt and your calculations are correct, then 3,680,000 interrupts per minute is still not that bad. 60*60/3680000 = 978us. So roughly one interrupt every millisecond, that's not a "hard real-time" requirement. If that's the only time-critical thing you need to do, then any shabby 8-bitter running at 8MHz could keep up.
Determine the position of 4 limit switches
You don't mention timing here but I assume this is something that could be polled cyclically by a low priority cyclic timer.
the MCU will output 4 separate PWM signals
Not a problem, just pick one with a decent PWM hardware peripheral. You should just need to update some PWM duty cycle registers now and then.
Overall, this doesn't sound all that real-time critical. I've done much worse real-time projects with icky 8 and 16 bitters. However, each time I did, I always regret not picking a faster MCU, because you always come up with stuff to add as the project/product goes on.
It sounds like your average mainstream Cortex M0+ would be a good candidate for this project. Clock it at ~48MHz and you'll have plenty of CPU power. Cortex M4 or larger if you actually expect floating point math (I don't quite see why you'd need that though).
Given the current component crisis, be careful with which brand you pick though! In particular stay clear of STM32, since ST can't produce them right now and you might end up waiting over a year until you get parts.
The answer to the question is "experience". But intuitively your example is not particularly taxing - although there are plenty of ways you could mess it up. I once worked on a project that ran on a 200MHz C5502 DSP at near 100% CPU load. The application now runs on a 72MHz Cortex-M3 at only 60% with additional functionality and I/O not present in the original implementation..
Your application is I/O bound; depending on data rates (and critically interrupt rates), I/O seldom constitutes the highest CPU load, and DMA, hardware FIFOs, input capture timer/counters, and hardware PWM etc. can be used to minimise the I/O impact. I shan't go into it in detail; #Lundin has already done that.
Note also that raw processor speed is important for data or signal processing and number crunching - but what I/O generally requires is deterministic real-time response, and that is seldom simply a matter of MHz or MIPS - you will get more deterministic and possibly faster response from an 8bit AVR running at a few MHz than you can guarantee from a 500MHz application processor running Linux - and it won't take 30 seconds to boot!
I am designing the controller and data acquisition unit for a rocket engine test stand. This system needs to control a number of actuators on the test stand and also be able to transmit collected data back to the host computer where the team will be watching live data/camera feeds from safety.
The overall design requirements are as follows:
Acquire data from ~15 analog sensors at 1KHz
Control the actuators on the test stand including valves and ignition switches
Transmit data back to the host computer in our shelter in real time
Accept control from the host computer for things like manual valve actuation, test sequence modification, sequence abortion, etc.
I am not exactly sure where to begin when laying out the software for this system. I am considering using an STM32 ARM Cortex-M4 processor running at 180 MHz. I am having trouble figuring how I should approach the problem. I have considered using an RTOS system but based on what I have seen those generate large overheads as you run them faster as the scheduler has to run each tick. The other idea I'm bouncing around is a state machine combined with some timer-based interrupts for reading and then sending data back out to the PC. Any advice as to how to approach this problem to minimize code complexity would be greatly appreciated. Thanks.
EDIT:
I have been told to clarify a number of things concerning the technical specs of the system.
My actuators consist of:
6 solenoids (controlled digitally through relays/MOSFET, and switched around once a second)
2 DC motors (driven with PWM outputs in a PID loop, need to be able to ramp position controllably)
One igniter, again controlled through a relay/MOSFET
My sensors consist of:
8 pressure transducers (analog voltages)
4 thermocouples (analog voltages)
2 motor encoders (quadrature encoders)
1 light sensor (analog voltage)
1 Load cell (analog voltage)
Ideally all of the collected data (all of the above sensors) plus some additional data (timestamps, motor set positions, solenoid positions) is streamed back to the host computer at in real time.
Given the motor control with PWM & PID, you need to specify a desired resolution, either in PWM timer ticks or ADC reads. This is the most critical part. It doesn't hurt if the ADC has greater resolution than your specified resolution either. The PCB has to be designed accordingly, with sufficient resolution on resistors etc.
After you've done this, find MCU with sufficiently accurate ADC. I would imagine that 12 bit resolution is enough for most applications, but I don't know your specific case.
Next, you need to decide how fast you want the PID to be. Should an output on the PWM result in a read on the ADC in the next cycle, or could you settle for slower response? The realtime bottleneck here will be the ADC conversion clock, not the CPU.
The rest of the system doesn't seem time critical at all - you just have to ensure that everything is read/set synchronously. The data transmission to/from the host should preferably be done over CAN since it comes with hard real-time characteristics. Doesn't seem that you need a whole lot of bandwidth.
I have designed systems very similar to this using bare metal 16 bit MCUs running on 16MHz. Processing speed is really not a big concern, but meeting real-time deadlines is. That means you can forget about using Linux toys like Rasp PI, it's completely out of the question. And a RTOS is likely overkill since it mostly adds additional complexity.
A bare metal Cortex M with sufficient ADC resolution and CAN seems like a good choice. If you can stay away from floating point, that's nice too - depends on how advanced math you need. If you need nothing more advanced than PID, it can be implemented with fixed point just fine. (Or PI rather, since that usually works best for fast motor control systems.)
here is the problem I am facing. I have interfaced my ATmega328P with a 6-axis IMU (MPU6050 with the GY521 breakout board). I can read data through the TWI interface (Atmel's I2C) and send it to my PC (running Ubuntu) via the UART. I am using custom-built libraries for both these communication protocols, but they are pretty standard and seem to work just fine. The goal of the project is to compute orientation data from the IMU readings in real-time, say at 100 Hz.
The main problem is that I cannot log data from the device at 100 Hz (not even at 50 Hz). The orientation filter I am using (here) requires a quite high frequency and 100 Hz turned out to work fine (tested offline acquiring data from another device).
Right now, I am using the 16-bit timer of the ATmega328P to sample data at 100 Hz and this seem to work, as I have added to the ISR a line to toggle the built-in LED and it looks to me that it is blinking at 100 Hz (I can barely see it turning on and off). In the same ISR, I read the values from the inertial sensor and, just to log them, send these values through the serial port. Every 10 ms (maximum), I send 9 floats (36 bytes) with a baud rate of 115200. If I use the Arduino IDE's Serial Monitor to visualize this data stream, I notice something very weird, as in the following screenshot.
https://imgur.com/zTBdkhv
As you notice taking a look at the timestamps, there is a common 33 ms delay every 2 or 3 sets of samples received. Moreover, I get roughly the 60% of the data. For example, an acquisition of 10 seconds only gets me less than 600 samples (per each variable) instead of 1000. Moreover, I tested the same sending only one variable through the UART (i.e. only a single float, 4 bytes) and this results in the same behavior!
By the way, I am exploiting the following to send each byte (char) via the UART interface.
void writeCharUART(char c) {
loop_until_bit_is_set(UCSR0A, UDRE0);
UDR0 = c;
}
Even though my ISR runs at 100 Hz (LED blinking seem to confirm that), data loss may occur at the level of the TWI transmission. To prove that, I modified the code of the ISR to send just a normal char (T) instead of data from the MPU and I got a similar behavior. Something like this:
00:10:05.203 -> T
00:10:05.203 -> T
00:10:05.236 -> T
00:10:05.236 -> T
00:10:05.236 -> T
00:10:05.236 -> T
00:10:05.269 -> T
So, I guess there is something wrong with the UART library and I actually sample at 100 Hz, but the logging frequency is much lower (and not constant). How can I solve this issue and/or debug the UART library? Do you see other reasons to justify this issue?
EDIT 1
As pointed out in the comments, it seems to be a problem of the receiving software that limits the frequency to ~30 Hz by some sort of buffering. To confirm that, I programmed the ATmega328P with the following code (this time using the IDE).
void loop() {
Serial.println("T");
}
At first, I thought there was no delay this time, but I could find it after 208 samples. So, there are ~200 samples received at the same timestamp and another bunch of samples after 33 ms. This may be proof that the receiving software introduces this delay.
I also tested a simple serial monitor that I had developed in C and, even though there is no timestamp functionality, I am also loosing samples if I fix the duration of the acquisition sampling at 100 Hz. My serial monitor is based on the termios.h library, but I could not find any documentation about its way of buffering incoming data.
There are two issues here:
You are missing messages. You checked the sample rate just with your eyes and told us that you can still see a very fast blinking. Depending on the colour of your LED, the ambient light, your physical state, and your eyes this could mean anything from 30 Hz to 100 Hz.
I would not trust my eyes to estimate and rather use an oscilloscope or a frequency counter to measure.
You could reduce the frequency of the LED blinking to 1Hz or even lower by dividing in software. Such a low frequency can be measured by hand via a stop watch. For example count 30 blinks and check the time needed for this.
Add a counter to the message and increment it with each message. You will see it right away if you're losing data.
The timestamps seem to indicate that the messages are "clustered" at about 30 Hz.
I'm guessing that the source of the timestamp in running at 30 Hz. So it can not give you more accurate values.
I kind of solved my issues! First of all, thanks to the comments I have checked that my ISR was correctly running at 100 Hz. Doing so, I could be sure that the problem where somewhere else, namely in the UART communication.
I found this very helpful: Linux, serial port, non-buffering mode
Apparently, the Serial Monitor provided by the Arduino IDE uses exploits the termios.h library and uses its default settings. I checked also the user manual and switched to the polling-read mode. Quoting from the user manual
If data is available, read(2) returns immediately, with the lesser of the number of bytes available, or the number of bytes requested. If no data is available, read(2) returns 0.
Hence, I switched back to my serial monitor code and changed the initPort() function adding the following lines of code.
struct termios options;
(...)
options.c_cc[VTIME] = 0;
options.c_cc[VMIN] = 0;
I noticed right away a much higher data frequency in the terminal. I kept the 1 Hz LED blinking in the ISR and there is no period stretching. Moreover, an acquisition of 10 seconds this time gave me roughly 1000 samples per variable, consistent with a sampling rate of 100 Hz.
On the AVR side, I also changed the way I send data through the UART. Before, I was sending 9 floats like this:
sprintf(buffer, "%f, %f, %f", value1_x, value1_y, value1_z);
serial_print(buffer); // no "\n" sent here
sprintf(buffer, "%f, %f, %f", value2_x, value2_y, value2_z);
serial_print(buffer); // again, no "\n" sent
sprintf(buffer, "%f, %f, %f", roll, pitch, yaw);
serial_println(buffer); // "\n" is sent here once the last data byte is sent
Now, I replaced all this with a single call to the function serial_println() and I write only 6 floats to the buffer.
I am trying to use SPIDEV module in Python 2.7 to interface Raspberry Pi 3 with ADS1256 ADC over the SPI bus available on the Raspberry Pi unit.
The project is to communicate with two of those ADCs and sample all the channels (8 channels each) at 250Hz sampling rate.
The functions in the SPIDEV module responsible for data transaction are xfer and xfer2. The problem with these functions is that they issue a CS active command (bring CS low), do the transaction and issue a CS release command (bring CS high). In order to communicate with ADS1256, a series of commands needs to be sent to the ADC while the CS is kept at logic low. This is possible by listing all the commands together and pass them to the xfer/xfer2 function like this:
$xfer2([10, 20, 30, 40])$
However, this way of sending commands do not give the ADC sufficient time to process each command or in other words, the timing between instructions violates the timing specifications of the ADC. If, on the other hand, one command is sent at a time, then the CS toggle causes the ADC to forget the previous command.
Two other alternatives suggested online that I tried, introduce too much delay that I cannot squeeze all channel reads within the time frame I have between each sampling instance. These alternatives are:
WiringPi module: This module has the wiringPiSPIDataRW function that is doing the data transaction only and CS can be controlled separately by IO functions in the module. The drawback is that time between each call to this function and as well as the time between bringing the CS low and calling this function is more than 200 microseconds which in aggregate will go over 4 milliseconds (my sampling period) when I sample all the channels.
Using a separate pin for CS when using SPIDEV: This option also introduced more than 100-microsecond delays between function calls.
These are the two suggestions that I have learned through digging Raspberry Pi community and Stack Overflow.
The xfer functions in SPIDEV also provide an argument called delay and according to the documentation it should control the delay between blocks but it only means that how long the CS should be kept low after a transaction is complete. For example: if I issue:
$xfer2([12, 23, 34, 46], 1800000, 30)$
It will keep CS low for 30 microseconds at the end only after sending 46 is over. It doesn't provide 30 microseconds between each byte i.e 12, 23, 34 and 46 which is an ideal thing that I need. However, if I do
xfer2([12])
xfer2([23])
xfer2([34])
xfer2([46])
of course, due to nature of Raspberry Pi, the time between each will be more than 100 microsecond which I cannot handle.
So something that will help me control the delay between commands is an ideal thing.
If not possible, something that will let me control CS in the xfer functions so they do not toggle it. Meaning that I can control the CS pin with IO functions. This will prevent a hardware modification on my board which is using the raspberry pi GPIO header CE pin as CS. Although it is still a slow solution but much faster than the functions in the wiringPi module.
In a worst case, I will have to modify my hardware and use a different GPIO pin to use as CS.
Thanks for reading
I'm developing a system that will get the GPS signal and send it though the GSM with information about position, speed and temperature from some digital sensors.
Currently I'm using the GPS EM408, the Arduino mega plus the GSM board (official one).
The problem is that the GPS (by the library TinyGPSPlus) gives me the same speed for long time or sometimes give me 0km/h.
The sketch works like this:
loop()
{
getGPSData() - ~ 1 sec to execute and take one data from the GPS.
getSensors() - ~ 1 sec to execute and take one data from the digital sensors.
sendData() - ~ 6 n 10 secs to send the data through the internet.
}
The whole process takes around 10 ~ 15 secs to be completed.
If I remove the sendData() and the system starts getting the GPS information each second the speed value works perfectly but if I get the data from the GPS each 12 secs (because of the GSM delay) the speed doesn't work as expected.
I understand that the problem is because the library TinyGPSPlus calculate the speed between two points and the getGPSData() only takes one information each loop and the next point has 15 secs of difference.
Although I've added a "for(i=0;i<=4;i++)" to the getGPSData() enforcing it to get at least 4 times of position before the GSM send it over the internet, now is working better but still getting the wrong value or sometimes it freezes to the same speedy for long time.
I've tried to add a second board and put both to communicate with I2C turning it "dual core", where one board will be getting the data from the GPS each second and another one will send though the data each 15 secs, but the GSM freezes sometimes when the I2C is connected :(.
Does anyone has any clue how to do it?
It is not good to add "for(i=0;i<=4;i++)" loop like you tried and furthermore to add duplicate device - so far from the solution. Instead you should unbind getGPSdata() from sendData() in other words separate the calls of them into different tasks.
I can imagine it is a simple Round-robin scheduling provided with that library, not a complete RTOS in there, isn't it? Nevertheless, put them into different tasks, loops or whatever. Probably you will need to arrange a buffer to collect GPS data to and sending out this from the buffer from a different loop.
Hope it helps.