Is it wrong to log inner working of an IoT device in case of failure? - c

I'm currently working on an IoT project and I want to log the execution of my software and hardware.
I want to log them then send them to some DB in case I need to have a look at my device remotely.
The wip IoT device will have to be as minimal as possible so the act of having to write very often inside a flash memory module seems weird to me.
I know that it will run the RTOS OS Nucleus on an Cortex-M4 with some modules connected through SPI.
Can someone with more expertise enlighten me ?
Thanks.

You will have to estimate your hourly/daily/whatever data volume that needs to go into the log and extrapolate to the expected lifetime of your product. Microcontroller flash usually isn't made for logging and thus it features neither enduring flash cells (some 10K-100K write cycles usually compared to 1M or more for dedicated data chips - look it up in the uC spec sheet) nor wear leveling. Wear leveling is any method which prevents software from writing to the same physical cell too frequently (which would e.g. be the directory for a simple file system).
For your log you will have to create a quite clever or complex method to circumvent any flash lifetime problems.
But the problems don't stop there: usually the MCU isn't able to read from Flash memory when writing to it where "writing" means a prolonged (several microseconds up to milliseconds depending on the chip) sequence of instructions controlling the internal Flash statemachine (programming voltage, saturation times, etc.) until the new values have reliably settled in the memory. And, maybe you guessed it, "reading" in this context also means reading instructions, that is you have to make sure that whichever code and interrupts that may occur during the Flash write are only executing code in RAM, cache or other memories and not in the normal instruction memory. It is doable but the more complex the SW system that you are running above the HW layer, the less likely it will work reliably.

Related

Output user data fast on Cortex R5

I'm trying to output some user data, character series, on a Cortex R5 to a PC.
Problem is that the uart is too slow for the amount of data and I'm looking for something faster. I hoped ITM could be used but sadly that's only available for Cortex M-series. The data contains status info about processes which I'd like to visualise for a better insight.
The Uart was running at the maximum of 921600 baud thus I'm looking for something faster than that. I'm looking for 2-5 Mbit.
I found information of DCC (debug communication channel) and ETM but I can't really figure out their speeds and how I could use them with user data instead of tracing data.
I have acces to tracers and debuggers (Green Hills SuperTrace and Realview ICE) so requiring those is no problem. I just can't figure out how to read the data. Perhaps I missed the obvious?
Edit: For now it looks like the easiest way is to bypass the CP2105 which limits my uart to 921600. I'll connect the RX/TX pins from the SoC to a RPi which should be able to get much higher bauds. Ofcourse, I'll also need a logic level shifter since the SoC is only 2.5V tolerant (74LVC245). If this setup works I'll answer my question. Thanks for the input!
The DCC is probably going to be slow, and maybe intrusive to use. You're limited to using JTAG to access this.
The ETM ought to be able to trace this information, and you should be able to configure the filtering to trace just the accesses to a specific memory address. Its a very long time since I looked in detail at the ETMv3 data trace, so I'm not sure if you need to trace the associated instruction or not. The debug tools also tend to be more focused on tracing instructions with data being an additional decoration rather than presenting a raw datastream, so processing the data might be non-trivial.
The ETM should provide several bits of data throughput per cycle, so as long as the data is in small bursts there should be enough bandwidth. Obviously this is package dependant, but small numbers of Gbps are achievable (with a substantial protocol cost depending on what information you're trying to push through the trace stream).
In some chips, an ETM can be shared between several processors (of the same type). ETCSCR[14:14] will be non-zero if this is the case, and then you're limited to selecting one core and tracing that (until the ETM is disabled/re-programmed).

How do I increase the speed of my USB cdc device?

I am upgrading the processor in an embedded system for work. This is all in C, with no OS. Part of that upgrade includes migrating the processor-PC communications interface from IEEE-488 to USB. I finally got the USB firmware written, and have been testing it. It was going great until I tried to push through lots of data only to discover my USB connection is slower than the old IEEE-488 connection. I have the USB device enumerating as a CDC device with a baud rate of 115200 bps, but it is clear that I am not even reaching that throughput, and I thought that number was a dummy value that is a holdover from RS232 days, but I might be wrong. I control every aspect of this from the front end on the PC to the firmware on the embedded system.
I am assuming my issue is how I write to the USB on the embedded system side. Right now my USB_Write function is run in free time, and is just a while loop that writes one char to the USB port until the write buffer is empty. Is there a more efficient way to do this?
One of my concerns that I have, is that in the old system we had a board in the system dedicated to communications. The CPU would just write data across a bus to this board, and it would handle communications, which means that the CPU didn't have to waste free time handling the actual communications, but could offload the communications to a "co processor" (not a CPU but functionally the same here). Even with this concern though I figured I should be getting faster speeds given that full speed USB is on the order of MB/s while IEEE-488 is on the order of kB/s.
In short is this more likely a fundamental system constraint or a software optimization issue?
I thought that number was a dummy value that is a holdover from RS232 days, but I might be wrong.
You are correct, the baud number is a dummy value. If you create a CDC/RS232 adapter you would use this to configure your RS232 hardware, in this case it means nothing.
Is there a more efficient way to do this?
Absolutely! You should be writing chunks of data the same size as your USB endpoint for maximum transfer speed. Depending on the device you are using your stream of single byte writes may be gathered into a single packet before sending but from my experience (and your results) this is unlikely.
Depending on your latency requirements you can stick in a circular buffer and only issue data from it to the USB_Write function when you have ENDPOINT_SZ number of byes. If this results in excessive latency or your interface is not always communicating you may want to implement Nagles algorithm.
One of my concerns that I have, is that in the old system we had a board in the system dedicated to communications.
The NXP part you mentioned in the comments is without a doubt fast enough to saturate a USB full speed connection.
In short is this more likely a fundamental system constraint or a software optimization issue?
I would consider this a software design issue rather than an optimisation one, but no, it is unlikely you are fundamentally stuck.
Do take care to figure out exactly what sort of USB connection you are using though, if you are using USB 1.1 you will be limited to 64KB/s, USB 2.0 full speed you will be limited to 512KB/s. If you require higher throughput you should migrate to using a separate bulk endpoint for the data transfer.
I would recommend reading through the USB made simple site to get a good overview of the various USB speeds and their capabilities.
One final issue, vendor CDC libraries are not always the best and implementations of the CDC standard can vary. You can theoretically get more data through a CDC endpoint by using larger endpoints, I have seen this bring host side drivers to their knees though - if you go this route create a custom driver using bulk endpoints.
Try testing your device on multiple systems, you may find you get quite different results between windows and linux. This will help to point the finger at the host end.
And finally, make sure you are doing big buffered reads on the host side, USB will stop transferring data once the host side buffers are full.

Can I run a program from a SD memory instead of flash on an evaluation board (embedded programming)?

I have an evaluation board (Olimex STM32-P103) which supports a SD-card connector. I want to put my program in to a SD memory instead of internal flash of the micro-controler; and run it from there.
I don't know if it is possible to do that according to boot-loader issue!
P.S my goal is running linux on this board and then port my application over it.
To run programs from SD-Card in general you should know that you can't run them "right away". This means, you have to load it in a executable memory somewhere in your address space which is done by a (more or less) simple bootloader. In the simplest instance, the bootloader is capable to read from a SD-Card a specific binary and copy it into the memory.
That being said you should think about this considering you only got 20k of RAM and 128k of Flash on your board. So where should your program go? Or better: Why not flashing the program in the 128k of Flash from the very beginning? Especially you should know that Linux is a bit "hungry" in terms of memory.
If your goal is to run a "normal" Linux on this board, I'm afraid you're screwed. This because from what I know Linux needs a MMU to run and the chip on this board does not provide one (as far as researchable without access to datasheets from ST).
If you're lucky you can go with uCLinux. I'm not sure if a finished port exists for the STM32 but it seems there are some resources based on a short google search for "STM32 uCLinux". But even if you manage to run uCLinux I'm afraid there's not much left in your system for your application, so the result might be a bit disappointing.
Depending on why you are looking for Linux running on this MCU, there are maybe other solutions like a FreeRTOS in combination with a lwIP-stack (if networking is needed) or a FAT library like FullFAT if you are looking for reading SD-Cards and stuff.
Edit: One thing i'd like to add is that booting from the SD-Card is typically something you do with "bigger" (not much but slightly) systems where you have enough RAM to keep the whole image you'd like to run in it and still have some space left for the data you want to process.
You're going to have to have some code in the STM's onboard flash (typically called a "boot loader") that implements this since the "bare metal" very likely can't boot from SD card.
You're going to have to build that code, which figures out how to use the STM's onboard peripherals to talk to the SD card, finds the file you want to run in the file system (which you also have to implement), and loads it.
I wanted to include a link to the STM standard peripheral library, but it seems to be down (being moved). :/
The data on the SD card is not memory mapped, so cannot be executed directly.
It is possible to dynamically load the data from the card into RAM for execution. WindRiver's VxWorks RTOS supports loading and linking object modules dynamically, I know of no other OS that would scale to a Cortex-M that directly supports that but it would be possible to write your own.
However, I would suggest that in the case of the microcontroller you are using the idea is ill-advised; optimal performance on Cortex-M is achieved when the code is in on-chip flash and data in RAM allowing the data and instruction to be fetch to occur simultaneously on the separate buses (Harvard architecture). If you execute the code from RAM the performance will be severely hit since then data and instructions must be fetched sequentially over the same bus.
The board is entirely unsuited to running Linux, with only 128K Bytes Program Flash, and 20K Bytes RAM is is not at all feasible. Even the smallest Linux distribution requires 600Kb RAM plus whatever is needed by application code. uClinux can just about run on higher-end STM32 with external RAM and Flash, but that would suffer from the same bus contention performance hit and Linux without an MMU is rather missing the one major benefit of using Linux at all. The part on your board lacks an external memory interface, so cannot be expanded to support Linux.
If you need an OS consider a RTOS such as uC/OS-II, FreeRTOS, or emBOS for example.
AS other says you cannot directly execute your code directly from the SD CARD.
But like those "linux board", you can load the stored kernel/programm into an external SDRAM that can be mapped and execute it from there.
You'll still need to write that "bootloader" and store it in the internal flash.
That'is a lot work to my opinion, for limited application.
If you want to write your application in a linux environnement then port it suck small target, I would rather design my application using dependency injection, or even use an emulator.

How does the in-application programming for ARM (Cortex M3) work?

I'm working on a custom Cortex-M3-based device and I need to implement in-application programming (IAP) mechanism so that it will be possible to update the device firmware without JTAG (we'll use TFTP or HTTP instead). While the IAP-related code examples available from ST Microelectronics are clear enough to me, I don't really understand how the re-flashing works.
As far as I understand, the instructions are fetched by the CPU from the Flash through the ICode bus (and the prefetch block, of course). So, here's my pretty silly question: why doesn't the running program get corrupted while it re-flashes itself (i.e. changes the Flash memory from which it is being run)?
A common solution is to have a small reserved area in the flash, where the actual flashing program is stored. When new firmware has been downloaded just make a jump to the code in this area.
Of course, this small area is not overwritten when flashing firmware, it can only be done by other means (like JTAG). So make sure this flashing-program works good to start with. :)
I'm not familiar with STM implementation, but in NXP chips the IAP routines are stored in a separate, reserved ROM area which can't be erased by user code.
If you're implementing the flash writing code yourself by using HW registers directly, you need to either make sure that it doesn't touch the sectors it's running from, or runs from the RAM.
Now a days many micro-controllers supporting IAP, that it is possible to program its flash memory while program execution in the same flash.
For IAP, program memory in the flash may be divided into 2 parts, one executable & other backup parts.
Generally we program the flash memory at a location (say, part-1) through JTAG, whose firmware version is 0.01. For IAP, i.e, program the flash in another part (part-2) while code is executing, corresponding API's should be provide in firmware version 0.01, which helps to program the flash part-2, After completion of programming successfully firmware version will be updated as 0.02. Upon processor restarts, program execution jumps to latest firmware by checking firmware version at initialization.
The part where the firmware is executing is called executable part, and other is back-up. why it is called back-up mean, suppose if there exists any firmware corruption while programming, firmware version will not update & upon restart, program control will automatically jumps back to back-up firmware after checking version number.
Another good way to do it is using custom made bootloader. However STM IAP is not stored in Flash so it can not be overwritten by it self. What generally people do is to spilt the flash in two parts , one is reserved for Custom made Bootloader and Another is for application. Bootloader makes sure that it does not write to its own assigned area. Bootloader can be programmed through JTAG and later application can utilize bootloader to program itself.
As far as I understand, the instructions are fetched by the CPU from the Flash through the ICode bus (and the prefetch block, of course). So, here's my pretty silly question: why doesn't the running program get corrupted while it re-flashes itself (i.e. changes the Flash memory from which it is being run)?
This is because, in general case writing/programming to flash memory is not allowed while you are reading from it(i.e executing code).
Have a look at this for some ideas on implementing IAP.

Any open-source ARM7 emulators suitable for linking with C?

I have an open-source Atari 2600 emulator (Z26), and I'd like to add support for cartridges containing an embedded ARM processor (NXP 21xx family). The idea would be to simulate the 6507 until it tries to read or write a byte of memory (which it will do every 841ns). If the 6507 performs a write, put the address and data on some of the ARM's I/O ports and let the ARM code run 20 cycles, confirm that the ARM is floating its data bus, and let the ARM run for another 38 cycles. If the 6507 performs a read, put the address on the ARM's I/O ports, let the ARM run 38 cycles, grab the data from the ARM's I/O port (hopefully the ARM software will have put it there), and let the ARM run another 20 cycles.
The ARM7 seems pretty straightforward to implement; I don't need to simulate a whole lot of hardware features. Any thoughts?
Edit
What I have in mind would be a routine that would take as a parameter a struct holding the machine state and pointers to a memory access routine. When called, the routine would emulate the ARM's instruction engine, generating appropriate reads, writes, and code fetches. I could then write the memory access routine to regard appropriate areas as flash (with roughly-approximated wait states), RAM, I/O ports, and timer registers. Some other areas would be marked as don't-care, and accesses to any other areas would flag an error and stop the emulator.
Perhaps QEMU uses such a thing internally. Since the ARM emulation would be integrated into an already-existing emulation engine (which I didn't write and don't fully understand--the only parts of Z26 I've patched have been the memory read/write logic) I would need something with a fairly small footprint.
Any idea how QEMU works inside? Any idea what the GPL licence would require if I just use 2% of the code in QEMU--whether I'd have to bundle the code for the whole thing, or just the part that I use, or what?
Try QEMU.
With some work, you can make my emulator do what you want. It was written for ARM920, and the Thumb instruction set isn't done yet. Neither is the MMU/cache interface. Also, it's slow because it is an interpreter. On the bright side, it's all written in C99.
http://code.google.com/p/gp2xemu/
I haven't worked on it for a while (The svn trunk is 2 years old), but if you're going to use the code, I'll be glad to help you out with the missing features. It is licensed under MIT, so it's just the same as the broad BSD license.

Resources