Microcontroller Serial Command Interpreter in C/C++; Ways to do it; - c

I'd like to interpret a command string, recieved by a microcontroller (PIC16f877A if that makes any difference) via serial.
The strings have a pretty simple and straight-foward formatting:
$AABBCCDDEE (5 "blocks" of 2 chracters+'$' for 11 characters in total) where:
$AA= the actual name of the command (could be letters, numbers, both; mandatory);
BB-EE= parameters (numbers; optional);
I'd like to write the code in C/C++.
I figure I could just grab the string via serial, hack it up into blocks, switch () {case} and memcmp the command block ($AA). Then I could have a binary decision tree to make use of the BB CC DD and EE blocks.
I'd like to know if that's the right way to do it (It kinda seems ugly to me, surely there must be a less tedious way to do this!).

Don't over design it ! It does not mean to go blindly coding, but once you have designed something that looks like it can do the job, you can start to implement it. Implementation will give you feedback about your architecture.
For example, when writing your switch case, you might see yourself rewriting code very similar to the one you just wrote for the preceding case. Actually writing down an algorithm will help you see some problem you did not think off, or some simplification you did not see.
Don't aim for the best code on the first try. Aim for
easy to read
easy to debug
Take litlle steps. You do not have to implement the whole thing in one go.
Grab the string from the serial port. Looks easy, right ? Well, let's do that first, just printing out the commands.
Separate the command from the parameters.
Extract the parameters. Will the extraction be the same for each command ? Can you design a data structure valid for every command ?
Once you have done it right, you can start to think of a better solution.

ASCII interfaces are ugly by definition. Ideally you have some sort of frame structure, which maybe you have, the $ indicates the division between frames and you say they are 11 characters in length. If always 11 that is good, if only sometimes that is harder, hopefully there is a $ at the start and 0x0A and or 0x0D/0x0A at the end (CR/LF). Normally I have one module of code that simply extracts bytes from the serial port and puts them into a (circular) buffer. The buffering dating to the days when serial ports had very little of no buffer on board, but even today, esp with microcontrollers, that is still the case. Then another module of code that monitors the buffer searching for frames. Ideally this buffer is big enough to leave the frame there and have room for the next frame and not require another buffer for keeping copies of the frames received. using the circular buffer this second module can move (discarding if necessary as it goes) the head pointer to the beginning of frame marker and waits for a full frames worth of data. Once a full frame appears to be there it calls another function that processes that frame. That function may be the one you are asking about. And "just code it" may be the answer, you are in a microcontroller, so you cant use lazy high level desktop application on an operating system solutions. You will need some sort of strcmp function if created yourself or available to you through a library, or not depending on your solution. The brute force if(strncmp(&frame[1],"bob",3)==0) then, else if(strncmp(&frame[1],"ted",3) then, else if... Certainly works but you may chew up your rom with that kind of thing, or not. And the buffering required for this kind of approach can chew up a lot of ram. This aproach is very readable and maintainable, and portable though. May not be fast (maintainable normally conflicts with reliable and/or performance), but that may not be a concern, so long as you can process this one before the next one comes along, and or before unprocessed data falls out of the circular buffer. Depending on the task the frame checker routine may simply check that the frame is good, I normally put start and end markers, length and some sort of arithmetic checksum and if it is a bad frame it is discarded, this saves on a lot of code checking for bad/corrupt data. When the frame processing routine returns to the search for frame routine it moves the head pointer to purge the frame as it is no longer needed, good frame or bad. The frame checker may only validate a frame and hand it off to yet another function that does the parsing. Each lego block in this arrangement has a very simple task, and operates on the assumption that the lego block below it has performed its task properly. Modular, object oriented, whatever term you want to use makes the design, coding, maintenance, debugging much easier. (at the cost of peformance and resources). This approach works well for any serial type stream be it serial port in a microcontroller (with enough resources) as well as applications on a desktop looking at serial data from a serial port or TCP data which is also serial and NOT frame oriented.
if your micro doesnt have the resources for all that, then the state machine approach also works quite well. Each byte that arrives ticks the state machine one state. Start with idle waiting for the first byte, is the first byte a $? no discard it and go back to idle. if first byte is a $ then go to the next state. If you were looking for say the commands "and", "add", "or", and "xor", then the second state would compare with "a","o", and "x", if none of these then go to idle. if an a then go to a state that compares for n and d, if an o then go to a state that looks for the r. If the look for the r in or state does not see the r then go to idle, if it does then process the command and then go to idle. The code is readable in the sense that you can look at the state machine and see the words a,n,d, a,d,d, o,r, x,o,r, and where they ultimately lead to, but generally not considered readable code. This approach uses very little ram, leans on the rom a bit more but overall could use the least amount of rom as well compared to other parsing approaches. And here again is very portable, beyond microcontrollers, but outside a microcontroller folks might think you are insane with this kind of code (well not if this were verilog or vhdl of course). This approach is harder to maintain, harder to read, but is very fast and reliable and uses the least amount of resources.
To matter what approach once the command is interpreted you have to insure you can perform the command without losing any bytes on the serial port, either through deterministic performance of the code or interrupts or whatever.
Bottom line ascii interfaces are always ugly, the code for them, no matter how many layers of libraries you use to make the job easier, the resulting instructions that get executed are ugly. And one size fits no-one by definition. Just start coding, try a state machine and try the if-then-else-strncmp, and optimizations in between. You should see quickly which one performs best both with your coding style, the tools/processor, and the problem being solved.

It depends on how fancy you want to get, how many different commands there are, and whether new commands are likely to be frequently added.
You could create a data structure that associates each valid command string with a corresponding function pointer - a sorted list accessed with bsearch() is probably fine, although a hash table is an alternative which may have better performance (since the set of valid commands is known beforehand, you could construct a perfect hash with a tool like gperf).
The bsearch() approach might look something like this:
void func_aa(char args[11]);
void func_cc(char args[11]);
void func_xy(char args[11]);
struct command {
char *name;
void (*cmd_func)(char args[11]);
} command_tbl[] = {
{ "AA", func_aa },
{ "CC", func_cc },
{ "XY", func_xy }
};
#define N_CMDS (sizeof command_tbl / sizeof command_tbl[0])
static int comp_cmd(const void *c1, const void *c2)
{
const struct command *cmd1 = c1, *cmd2 = c2;
return memcmp(cmd1->name, cmd2->name, 2);
}
static struct command *get_cmd(char *name)
{
struct command target = { name, NULL };
return bsearch(&target, command_tbl, N_CMDS, sizeof command_tbl[0], comp_cmd);
}
Then if you have command_str pointing to a string from the serial port, you'd do this to dispatch the right function:
struct command *cmd = get_cmd(command_str + 1);
if (cmd)
cmd->cmd_func(command_str);

Don't know if you're still working on this. But I'm working on a similar project and found an embedded command line interpreter http://sourceforge.net/projects/ecli/?source=recommended. That's right, they had embedded applications in mind .
The cli_engine function really helps in taking the inputs from your command line.
Warning: there is no documentation besides a readme file. I'm still working through some bugs integrating the framework but this definitely gave me a head start. You'll have to deal with comparing the strings (i.e. using strcmp) yourself.

Related

Can I programmatically detect changes in a sketch?

At work we have an Arduino sketch that gets changed periodically. In a nutshell, it communicates back and forth on a Serial port. For the most part our software development team controls the code; however, there are some other teams at our company that periodically make last minute changes to the sketch in order to accommodate specific client needs.
This has obviously been quite problematic because it means we might have different versions of our sketch deployed in different places without realizing it. Our software developers are very good at using source control but the other teams are not quite so disciplined.
One idea that was proposed was hard-coding a version number, so that a certain serial command would respond by reporting back the predefined version number. The trouble however is that our other teams might likewise fail to have the discipline to update the version number if they decide to make other changes.
Obviously the best solution involves cutting off the other team from making updates, but assuming that isn't possible for office politics reasons, I was wondering if there's any way to programmatically "reflect" on an Arduino sketch. Obviously a sketch is going to take up a certain number of bytes, and that sketch file is going to have a unique file hash. I was thinking if there was some way to either get the byte count, the file hash, or the last modified time as a preprocessor directive that can be injected into code that would be ideal. Something like this:
// pseudocode
const String SKETCH_FILE_HASH = #filehash;
const int SKETCH_FILE_SIZE = #filesize;
const int SKETCH_LAST_UPDATED = #modified;
But that's about as far as my knowledge goes with this. Is there any way to write custom preprocessor directives, or macros, for Arduino code? Specifically ones that can examine the sketch file itself? Is that even possible? Or is there some way that already exists to programmatically track changes in one way or another?
Risking an answer.
SKETCH_FILE_HASH : you would have to precompute externally and pass as a flag. I guess you're using the arduino IDE and this is not doable
SKETCH_FILE_SIZE: same answer
SKETCH_LAST_UPDATED: You can use __TIME__ to get a string containing compilation time.
What I would do, taking into account the polititc parts.
enmbed a keyword linked to your version control (e.g. svn:id for subversion, almost all VCS provide this)
embed compilation time
change the official build (the one the SW team controls) to use the actual toolchain and not the IDE and put it on a jenkins : you'll be able to use compilation flags!
embed a code like
#ifndef BUILD_TYPE
#define BUILD_TYPE "Unsupported"
#endif
On your continuous build process, use -DBUILD_TYPE="HEAD" or "Release"
I'm sorry I don't see a magicx wand solving your solution. I'd invest a lot into training on why version control can save you (seems you already have the war stories)
I was looking at this issue myself, and found this:
https://gist.github.com/jcw/1985789#file-bootcheck-ino
This is to look up the bootloader; but I'm thinking that something like this could be used for determining a signature of some sort for the code as a whole.
I did a quick experiment, where I added something like:
Serial.print("Other...");
Serial.println(CalculateChecksum(0, 2048));
in void setup(), and was able to get different values for the CRC, based on changing a tiny bit of code (a string).
This is not an explicit solution; I tried CalculateChecksum(0, 32767), and so on, and if I defined an integer like int a=101; and changed it to int a=102; the checksum was the same. Only when I changed a string (i.e., add a space) did this value change.
I'm not crystal clear on the way memory is allocated in the Arduino; I do know there is program memory (32,256 bytes) and global variable memory (2048 bytes), so I'm sure there is some way of doing this.
In another experiment, I used the pgm_read_byte() function, and if I create a simple memory dump function:
void MemoryDump (word addr, word size) {
word dataval = ~0;
// prog_uint8_t* p = (prog_uint8_t*) addr;
uint8_t* p = (uint8_t*) addr;
for (word i = 0; i < size; ++i)
{
dataval = pgm_read_byte(p++);
Serial.print(i);
Serial.print(" ->");
Serial.print(dataval,HEX);
Serial.print(" ");
Serial.print(dataval);
Serial.print(" ");
if(dataval>32)
{
Serial.print(char(dataval));
}
else
{
Serial.print("***");
}
Serial.print("\n");
}
}
... and I put in a line like:
Serial.println(F("12345fghijklmnopqrstuvwxyz"));
because the F() puts the string in program memory, you will see it there.
Reading the SRAM is a bit of an issue, as noted here:
http://forum.arduino.cc/index.php?topic=220125.0
I'm not a compiler god, so I don't know how stuff like a=101; looks to the compiler/IDE, or why this doesn't look different to the program memory area.
One last note:
http://playground.arduino.cc/Code/AvailableMemory
Those functions access SRAM, so perhaps, with a bit of tweaking, you could do a CRC on that memory, but it would seem a bit of an issue, since you have to be doing a computation with a variable... in SRAM! But if the code was identical, even if doing a computation like that, it might be possible. Again, I'm in deep water here, so if an AVR god has issue with this, please destroy this theory with an ugly fact!

How can I do printf style debugging over a slow CAN bus - with constant strings on the remote tool, not the embedded system

Currently, on my embedded system (coded in C), I have a lot of debug-assistive print statements which are executed when a remote tool is hooked up to the system that can display the messages to a PC. These help to understand the general system status, but as the messages are going over a slow CAN bus, I believe they may be clogging the tubes and causing other problems with trying to get any useful data logged.
The basic gist of it is this:
It's like a printf, but ultimately in a special message format that gets sent from the embedded system to the tool over the CAN bus. For this purpose, I can replace this generic print message with special debugging messages that send it a unique ID followed by only the variable parameters (i.e. the argc/argv). I am wondering if that is the right way to go about it, or if there is a magic bullet I've missed, or something else I haven't thought of.
So, I found this question which starts out well for my purposes:
printf() debugging library using string table "decoder ring"
But I do not have the constraint of making it as easy as a printf. I can maintain a table of strings on the remote tool's side (since it's a Windows executable and therefore not as code size limited). I am the sole person responsible for this code and would prefer to try and lighten up the code size as well as the CAN bus traffic while debugging.
My current thoughts are thus:
printf("[%d] User command: answer call\n", (int)job);
This becomes
debug(dbgUSER_COMMAND_ANSWER_CALL, job);
dbgUSER_COMMAND_ANSWER_CALL being part of an enum of possible debug messages
and the remote side has something like
switch(messagetype)
{
case dbgUSER_COMMAND_ANSWER_CALL:
/* retrieve an integer from the data portion of the message and put it into a variable */
printf("[%d] User command: answer call\n", (int)avariable);
}
That's relatively straightforward and it would be fantastic if all my messages came in that same format. Where it gets tricky, though, is where some of my statements have to print strings which are not constant (the name of the device, for example).
printf("[%d] %02X:%02X:%02X:%02X:%02X:%02X (%s)\n", /* a bunch of parameters here */);
So, should I make it so that the contents of the debug message are 1) the message type, 2) length of the first parameter, 3) the parameter, 4) length of the next parameter, 5) the parameter, so on and so forth
Or have I overlooked something more obvious or easy?
Thanks
I'm assuming you use CAN because that's the connection you already have to your device. You haven't really provided enough information about your diagnostic needs, but I can give an example of what we do where I work. We use a custom build tool to comb through our sources building up a string table. Our code uses something like:
log( LOGH(T0722T), //"Position"
LOG_DOT_HEX_VALUE(i),
LOG_TEXT(T0178T), //"Id"
LOG_DOT_VALUE(uniqueId % 10000),
0 );
This would record some data which could be decoded into:
<timestamp> H Position.00B Id.0235
We allow use of T0000T to have the tool lookup (or generate) a unique number for us. The tool builds up an enum using the TxxxxT numbers for the compiler, and a file containing the ordered list of strings. Every build generates a string table which matches the enum numbering. This system also ties into a database system used for generating internationalized strings, but that's not really relevant to your question.
Each element is a short (16 bits). We allow 12 bits for values and use the high 4 bits for type and flag info. Is the encoded data a string id; Is it a signal (High/Low) or just an event; Is it a value; Is it decimal, hexidecimal, base64url; concatenated (with the preceding item) or separate.
We record data in a large ring buffer for querying if needed. This allows the system to run without interference but the data can be extracted if a problem is noted. It is certainly possible to constantly send the data, and if that's desired I'd suggest limiting any one message to a single CAN payload (that's 8 bytes assuming you only use a single ID). The message I provided above would fit in one CAN message if properly encoded (assuming the receiving side created the timestamps).
We do have the ability to include arbitrary data (ASCII or Hexadecimal), but I never use it. It's usually a waste of precious space in the logs. Logging is always a balance in embedded environments.

random reading on very large files with fgets seems to bring Windows caching at it's limits

I have written a C/C++-program for Windows 7 - 64bit that works on very large files. In the final step it reads lines from an input-file (10GB+) and writes them to an output file. The access to the input-file is random, the writing is sequential.
EDIT: Main reason for this approach is to reduce RAM usage.
What I basically do in the reading part is this: (Sorry, very shortened and maybe buggy)
void seekAndGetLine(char* line, size_t lineSize, off64_t pos, FILE* filePointer){
fseeko64(filePointer, pos, ios_base::beg);
fgets(line, lineSize, filePointer);
}
Normally this code is fine, not to say fast, but under some very special conditions it gets very slow. The behaviour doesn't seem to be deterministic, since the performance drops occure on different machines at other parts of the file or even don't occure at all. It even goes so far, that the program totally stops reading, while there are no disc-operations.
Another sympthom seems to be the used RAM. My process keeps it's RAM steady, but the RAM used by the System grows sometimes very large. After using some RAM-Tools I found out, that the Windows Mapped File grows into several GBs. This behaviour also seems to depend on the hardware, since it occure on different machines at different parts of the process.
As far as I can tell, this problem doesn't exist on SSDs, so it definitely has something to do with the responsetime of the HDD.
My guess is that the Windows Caching gets somehow "wierd". The program is fast as long as the cache does it's work. But when Caching goes wrong, the behaviour goes either into "stop reading" or "grow cache size" and sometimes even both. Since I'm no expert for the windows caching algorithms, I would be happy to hear an explanation. Also, is there any way to get Windows out of C/C++ to manipulate/stop/enforce the caching.
Since I'm hunting this problem for a while now, I've already tried some tricks, that didn't work out:
filePointer = fopen(fileName, "rbR"); //Just fills the cache till the RAM is full
massive buffering of the read/write, to stop getting the two into each others way
Thanks in advance
Truly random access across a huge file is the worst possible case for any cache algorithm. It may be best to turn off as much caching as possible.
There are multiple levels of caching:
the CRT library (since you're using the f- functions)
the OS and filesystem
probably onboard the drive itself
If you replace your I/O calls via the f- functions in the CRT with the comparable ones in the Windows API (e.g., CreateFile, ReadFile, etc.) you can eliminate the CRT caching, which may be doing more harm than good. You can also warn the OS that you're going to be doing random accesses, which affects its caching strategy. See options like FILE_FLAG_RANDOM_ACCESS and possibly FILE_FLAG_NO_BUFFERING.
You'll need to experiment and measure.
You might also have to reconsider how your algorithm works. Are the seeks truly random? Can you re-sequence them, perhaps in batches, so that they're in order? Can you limit access to a relatively small region of the file at a time? Can you break the huge file into smaller files and then work with one piece at a time? Have you checked the level of fragmentation on the drive and on the particular file?
Depending on the larger picture of what your application does, you could possibly take a different approach - maybe something like this:
decide which lines you need from the input file and store the
line numbers in a list
sort the list of line numbers
read through the input file once, in order, and pull out the lines
you need (better yet, seek to next line and grab it, especially when there's big gaps)
if the list of lines you're grabbing is small enough, you can store
them in memory for reordering before output, otherwise, stick them
in a smaller temporary file and use that file as input for your
current algorithm to reorder the lines for final output
It's definitely a more complex approach, but it would be much kinder to your caching subsystem, and as a result, could potentially perform significantly better.

file and formatting alternative libs for c

I've done some searching and have not found anything that would boost the file and formatting functions in Visual Studio VS2010 C (not C++).
I've been able to address the raw i/o issues to some extent by using large buffers and a SSD drive, so the more pressing issue is a replacement for the family of printf functions.
Has anyone found something worthwhile?
As I understand it, part of the glacial speed issue with the printf functions is that they have to handle myriad types of arguments. Does anyone have experience with writing a datatype-specific version of printf; eg, one that only prints ints, or only prints doubles, etc?
First off, you should profile the code first before assuming it's printf.
But if you're sure it's printf and similar then you can do a few things to fix the issue.
1) print less. IE, don't call expensive operations as much if you can avoid it. Do you need all the output, for example?
2) manually replace the string concatenation with manually built routines that do all the pieces without having to parse the format specifier.
EG: printf("--%s--", "really cool");
Can become:
write(1, "--", 2);
write(1, "really cool", 11);
write(1, "--", 2);
That may be faster. But again, you won't know until you profile it. Don't spend energy on a solution till you can confirm it's the solution you need and be able to measure the success of your proposed solution.
#Wes is right, never assume you know what you need to fix until you have proof.
Where I differ is on the method of finding out.
I and others use random pausing which works for these reasons, and here's a short slide show demo & C++ code so you can see how it works, if you want.
The thing about printf (or any output) function is it spends A) a certain number of CPU cycles creating a buffer to be output, and then it spends B) a certain amount of time waiting while the system and/or auxiliary hardware actually moves the data out.
That's maybe a bit over-simplified, but if you randomly pause and examine the state, that's what you see.
What you've done by using large buffers and an SSD drive is reduce B, and that's good.
That means of the time remaining, A is a larger fraction.
You know that.
Now of the samples you find in A, you might get a hint of what's happening if you see what subordinate routines inside printf are showing up.
Usually printf calls something like vprintf to get rid of the variable argument list, which then cycles over the format string to figure out what to do, including things like parsing precision specifiers.
If it looks like that's what it's doing, then you know about how much time goes into parsing the format.
On the other hand, if you see it inside a routine that is copying a string, or formatting an integer (along with dealing with leading/trailing characters, etc.) then you know to concentrate on that.
On yet another hand, if you see it inside a routine that looks like it's formatting a floating point number (which is actually quite complicated), you know to concentrate on that.
Given all that, you want to know what I do?
First, I ask who is going to read this anyway?
If nobody really needs to read all this text, why not pump it out in binary? Or failing that, in hex?
If you simply write binary, A shrinks to nothing, and when you read it back in with another program, guess what?
No Lost Bits!

Parsing: load into memory or use stream

I'm writing a little parser and I would like to know the advantages and disadvantages of the different ways to load the data to be parsed. The two ways that I thought of are:
Load the file's contents into a string then parse the string (access the character at an array position)
Parse as reading the file stream (fgetc)
The former will allow me to have two functions: one for parse_from_file and parse_from_string, however I believe this mode will take up more memory. The latter will not have that disadvantage of using more memory.
Does anyone have any advice on the matter?
Reading the entire file in or memory mapping it will be faster, but may cause issues if you want your language to be able to #include other files as these would be memory mapped or read into memory as well.
The stdio functions would work well because they usually try to buffer up data for you, but they are also general purpose so they also try to look out for usage patterns which differ from reading a file from start to finish, but that shouldn't be too much overhead.
A good balance is to have a large circular buffer (x * 2 * 4096 is a good size) which you load with file data and then have your tokenizer read from. Whenever a block's worth of data has been passed to your tokenizer (and you know that it is not going to be pushed back) you can refill that block with new data from the file and update some buffer location info.
Another thing to consider is if there is any chance that the tokenizer would ever need to be able to be used to read from a pipe or from a person typing directly in some text. In these cases your reads may return less data than you asked for without it being at the end of the file, and the buffering method I mentioned above gets more complicated. The stdio buffering is good for this as it can easily be switched to/from line or block buffering (or no buffering).
Using gnu fast lex (flex, but not the Adobe Flash thing) or similar can greatly ease the trouble with all of this. You should look into using it to generate the C code for your tokenizer (lexical analysis).
Whatever you do you should try to make it so that your code can easily be changed to use a different form of next character peek and consume functions so that if you change your mind you won't have to start over.
Consider using lex (and perhaps yacc, if the language of your grammar matches its capabilities). Lex will handle all the fiddly details of lexical analysis for you and produce efficient code. You can probably beat its memory footprint by a few bytes, but how much effort do you want to expend into that?
The most efficient on a POSIX system would probably neither of the two (or a variant of the first if you like): just map the file read-only with mmap, and parse it then. Modern systems are quite efficient with that in that they prefetch data when they detect a streaming access etc., multiple instances of your program that parse the same file will get the same physical pages of memory etc. And the interfaces are relatively simple to handle, I think.

Resources