MCU to MCU Communication with Profibus/Profinet? - arm

Is that a possible communication between two STM32f4 devices with Profibus/Profinet Protocole. I can not find any information this question. I want to MASTER to SLAVE communication. And i want to send message/data between.

Profibus is not really designed for MCU to MCU communications. The only reason you would use it is if they are some distance apart and you had one master and multiple slaves. Profibus DP generally requires a specific Profibus IC (such as Profichip) to implement a Slave module as the timing is critical. Similarly implementing a Master is also a complex affair. It is meant to be used as a field bus (for use over entire factories or sites etc.)
I don't know that much about Profinet, but I doubt it is what you want for MCU to MCU comms.
If the processors are in the same device I2C would probably be your best bet and is physically supported by the STM32F4 (you would need a protocol on top this as well).

Related

How is AMBA ACE different from the AXI protocol?

What is the difference between ADVANCED MICROCONTROLER BUS ARCHITECTURE-ACE and ADVANCED MICROCONTROLER BUS ARCHITECTURE-AXI protocol?
The difference is stated in the ARM document,
The ACE protocol extends the AXI4 protocol and provides support for hardware-coherent caches. ...
A five state cache model to define the state of any cache line in the coherent system.
See MOESI protocol at Wikipedia (five state cahce protocol). It seems pretty clear,
The AXI is a bus protocol which provides a means to transfer burst data from masters to slaves over a wide data path. The ACE protocol extends the signalling to allow a MOESI type protocol to be used for cache coherency.
So, if you were to develop IP for a GPU (Vivante) and wanted to support cache coherency with a CPU, you would need the signals provide by ACE. Otherwise, I think it is mainly between the CPU complex (4/8 etc cores) on the chip. Also there is a trend to asymmetric multiple CPUs, so a legacy chip could talks to an ARM CPU. Or you could have an ARM-to-RISCV translation. It might also be useful to a machine learning type acceleration engine.
Most DMA masters do not perform work WITH a CPU. So even a crypto engine probably would not need the ACE protocol signals. Also, you could have a simple AXI connections between the devices in the paragraph above and it could work, but would be less performant.
Summarizing all that,
The AXI protocols are 'master' to 'slave' (provider/consumer, etc) where as the ACE is a master to master interface. So AXI is useful for an Ethernet, Video, memory controller, but ACE will be used to co-ordinate computations between masters that process data in complex ways. The ACE is a protocol to say that another consumer is working on that portion/section of the provider data and you should block or do something else.

Writing device library C/C++ for STM32 or ARM

I need to develop device libraries like uBlox, IMUs, BLE, ecc.. from scratch (almost). Is there any doc or tutorial that can help me?
Question is, how to write a device library using C/C++ (Arduino style if you want) given a datasheet and a platform like STM32 or other ARMs?
Thanks so much
I've tried to read device libraries from Arduino library and various Github, but I would like to have a guide/template to follow (general rules) to write proper device libraries from a given datasheet.
I'm not asking a full definitive guide, just where to start, docs, methods approach.
I've found this one below, but is very basic and quite lite for my targets.
http://blog.atollic.com/device-driver-development-the-ultimate-guide-for-embedded-system-developers
I don't think that you can actually write libraries for STM32 in Arduino style. Most Arduino libraries you can find in the wild promote ease of usage rather than performance. For example, a simple library designed for a specific sensor works well if reading the sensor and reporting the results via serial port is the only thing that firmware must do. When you work on more complex projects where uC has lots to do and satisfy some real time constraints, the general Arduino approach doesn't solve your problems.
The problem with STM32 library development is the complex connection between peripherals, DMA and interrupts. I code them in register level without using the Cube framework and I often find myself digging the reference manual for tables that shows the connections between DMA channels or things like timer master-slave relations. Some peripherals (timers mostly) work similar but each one of them has small differences. It makes development of a hardware library that fits all scenarios practically impossible.
The tasks you need to accomplish are also more complex in STM32 projects. For example, in one of my projects, I fool SPI with a dummy/fake DMA transfer triggered by a timer, so that it can generate periodic 8-pulse trains from its clock pin (data pins are unused). No library can provide you this kind of flexibility.
Still, I believe not all is lost. I think it may be possible to build an hardware abstraction layer (HAL, but not The HAL by ST). So, it's possible to create useful libraries if you can abstract them from the hardware. A USB library can be a good example for this approach, as the STM32 devices have ~3 different USB peripheral hardware variations and it makes sense to write a separate HAL for each one of them. The upper application layer however can be the same.
Maybe that was the reason why ST created Cube framework. But as you know, Cube relies on external code generation tools which are aware of the hardware of each device. So, some of the work can be avoided in runtime. You can't achieve the same result when you write your own libraries unless you also design a similar external code generation tool. And also, the code Cube generates is bloated in most cases. You trade development time for runtime performance and code space.
I assume you will be using a cross toolchain on some platform like Linux, and that the cross toolchain is compatible with some method to load object code on the target CPU. I also assume that you already have a working STM32 board that is documented well enough to figure out how the sensors will connect to the board or to the CPU.
First, you should define what your library is supposed to provide. This part is usually surprisingly difficult. It’s a bit hard to know what it can provide, without knowing a bit about what the hardware sensors are capable of providing. Some iteration on the requirements is expected.
You will need to have access to the documentation for the sensors, usually in the form of the manufacturer’s data sheets. Using the datasheet, and knowing how the device is connected to the target CPU/board, you will need to access the STM32 peripherals that comprise the interface to the sensors. Back to the datasheets, this time for the STM32, to see how to access its peripheral interfaces. That might be simple GPIO bits and bytes, or might be how to use built-in peripherals such as SPI or I2C.
The datasheets for the sensors will detail a bunch of registers, describing the meaning of each, including the meanings of each bit, or group of bits, in certain registers. You will write code in C that accesses the STM32 peripherals, and those peripherals will access the sensors across the electrical interface that is part of the STM32 board.
The workflow usually starts out by writing to a register or three to see if there is some identifiable effect. For example, if you are exercising a digital IO port, you might wire up an LED to see if you can turn it on or off, or a switch to see if you can correctly read its state. This establishes that your code can poke or peek at IO using register level access. There may be existing helper functions to do this work as part of the cross toolchain. Or you might have to develop your own, using pointer indirection to access memory mapped IO. Or there might be specially instructions needed that can only be accessed from inline assembler code. This answer is generic as I don’t know the specifics of the STM32 processor or its typical ecosystem.
Then you move on to more complex operations that might involve sequences of operations, like cycling a bit or two to effect some communication with the device. Or it might be as simple as finding the proper sequence of registers to access for operation of a SPI interface. Often, you will find small chunks of code are complete enough to be re-used by your driver; like how to read or write an individual byte. You can then make that a reusable function to simplify the rest of the work, like accessing certain registers in sequence and printing the contents of register that you read to see if they make sense. Ultimately, you will have two important pieces of information: and understanding of the low-level register accesses needed to create a formal driver, and an understanding of what components and capabilities make up the hardware (ie, you know how the device(s) work).
Now, throw away most of what you’ve done, and develop a formal spec. Use what you now know to include everything that can be useful. Use what you now know to develop a spec that includes an appropriate interface API that your application code can use. Rewrite the driver, armed with the knowledge of how are the pieces work, and taking advantage of the blank canvas afforded you by the fresh rewrite of the spec. Only reuse code that you are completely confident is optimal and appropriate to the format dictated by the spec. Write test code for all of the modules, and use the test code to actually test that the code works and that it conforms to the spec. Re-use the test code every time you modify anything it tests.

How do I increase the speed of my USB cdc device?

I am upgrading the processor in an embedded system for work. This is all in C, with no OS. Part of that upgrade includes migrating the processor-PC communications interface from IEEE-488 to USB. I finally got the USB firmware written, and have been testing it. It was going great until I tried to push through lots of data only to discover my USB connection is slower than the old IEEE-488 connection. I have the USB device enumerating as a CDC device with a baud rate of 115200 bps, but it is clear that I am not even reaching that throughput, and I thought that number was a dummy value that is a holdover from RS232 days, but I might be wrong. I control every aspect of this from the front end on the PC to the firmware on the embedded system.
I am assuming my issue is how I write to the USB on the embedded system side. Right now my USB_Write function is run in free time, and is just a while loop that writes one char to the USB port until the write buffer is empty. Is there a more efficient way to do this?
One of my concerns that I have, is that in the old system we had a board in the system dedicated to communications. The CPU would just write data across a bus to this board, and it would handle communications, which means that the CPU didn't have to waste free time handling the actual communications, but could offload the communications to a "co processor" (not a CPU but functionally the same here). Even with this concern though I figured I should be getting faster speeds given that full speed USB is on the order of MB/s while IEEE-488 is on the order of kB/s.
In short is this more likely a fundamental system constraint or a software optimization issue?
I thought that number was a dummy value that is a holdover from RS232 days, but I might be wrong.
You are correct, the baud number is a dummy value. If you create a CDC/RS232 adapter you would use this to configure your RS232 hardware, in this case it means nothing.
Is there a more efficient way to do this?
Absolutely! You should be writing chunks of data the same size as your USB endpoint for maximum transfer speed. Depending on the device you are using your stream of single byte writes may be gathered into a single packet before sending but from my experience (and your results) this is unlikely.
Depending on your latency requirements you can stick in a circular buffer and only issue data from it to the USB_Write function when you have ENDPOINT_SZ number of byes. If this results in excessive latency or your interface is not always communicating you may want to implement Nagles algorithm.
One of my concerns that I have, is that in the old system we had a board in the system dedicated to communications.
The NXP part you mentioned in the comments is without a doubt fast enough to saturate a USB full speed connection.
In short is this more likely a fundamental system constraint or a software optimization issue?
I would consider this a software design issue rather than an optimisation one, but no, it is unlikely you are fundamentally stuck.
Do take care to figure out exactly what sort of USB connection you are using though, if you are using USB 1.1 you will be limited to 64KB/s, USB 2.0 full speed you will be limited to 512KB/s. If you require higher throughput you should migrate to using a separate bulk endpoint for the data transfer.
I would recommend reading through the USB made simple site to get a good overview of the various USB speeds and their capabilities.
One final issue, vendor CDC libraries are not always the best and implementations of the CDC standard can vary. You can theoretically get more data through a CDC endpoint by using larger endpoints, I have seen this bring host side drivers to their knees though - if you go this route create a custom driver using bulk endpoints.
Try testing your device on multiple systems, you may find you get quite different results between windows and linux. This will help to point the finger at the host end.
And finally, make sure you are doing big buffered reads on the host side, USB will stop transferring data once the host side buffers are full.

Is there a suitable Two wire interface / I2C reading writing library in Contiki OS for Atmega128 platform?

I wish to read the EUI64 address from an AT24MAC602 memory chip interfaced to an Atmega128rfa1 MCU over the Two wire interface. I tried to modify the I2C master drivers which are available for other platforms to suit my need. However, I wasn't able to carry out these modifications successfully as the program stopped responding as soon as the slave address was written to the twi bus with Write flag set. I failed to uncover the underlying reasons for the same.
As Contiki OS is quite popular, i thought someone might have already come up with contiki specific libraries for reading writing over TWI interface for Atmega128rfa1 MCU. If so, please provide pointers to the twi drivers or documentation for the same, or suggest factors that should be considered for developing such drivers. Thank you.
If you don't have any luck finding/creating a driver for the TWI peripheral, you might consider emulating it by configuring the SDA/SCL pins as general I/O and then implementing the TWI protocol yourself. If you're just doing a one-time read of a chip ID then speed probably isn't a big concern, so this could work if you get desperate. Google should throw up a few examples of emulated TWI.

Custom RS485 Protocols

I am writing a simple multi-drop RS485 protocol for serial communications within a distributed system. I am using an addressable model where slave devices are given a window of 20ms to respond. The master uC polls the connected devices for updates and they respond accordingly. I've employed checksums and take the necessary overrun precautions to ensure that connected devices will not respond to malformed messages. This method has proved effective in approximately 99% of situations, but I lose the packet if a new device is introduced during a communication session. Plugging in a new device "hot" will have negative effects on the signal being monitored by the slave devices, if only for an extremely short time. I'm on the software side of engineering, but how I can mitigate this situation without trying to recreate TCP? We use a polling model because it is fast and does the job well for our application, no need for RTOS functionality. I have an abundance of cycles on each cpu, think in basic terms.
Sending packets over the RS485 is not a reliable communication. You will have to handle the lost of packets anyway. Of course, you won't have to reinvent TCP. But you will have to detect lost packets by means of timeout monitoring and sequence numbers. In simple applications this can be done at application level, what keeps you far off from the complexity of TCP. When your polling model discards all packets with invalid checksum this might be integrated with less effort.
If you want to check for collisions, that can be caused by hot plugs or misbehaving devices there are probably some improvements. Some hardware allows to read back the own transmissing. If you find a difference between sent data and receive data, you can assume a collision and repeat the packet. This will also require a kind of sequence numbering.
Perhaps I've missed something in your question, but can't you just write the master so that if a response isn't seen from a device within the allowed time, it re-polls that device?

Resources