Parse Bytestream in C (embedded programming) [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
So I have a communication system between two microcontrollers and I'm sending data between them, i.e. sensor data from one µC to the other one, and commands back.
The sensor data is taken from a struct and put into a frame, which looks like this (the identifier "SENSORFRAME" is not constant; it depends on what's in the frame):
sprintf(message, "\:\\SENSORFRAME\:%.2f;%.2f;%.2f;%d;%d;%u\r\r",
input.temperature, input.current, input.voltage,
input.dutycycle,input.lightsensor, input.message);
Resulting in a Frame like this:
:\SENSORFRAME:-12.40;1.42;0.53;500;1200;8\r\r
Or for the commandframes:
sprintf(message, "\:\\COMMANDFRAME\:%u;%.5f\r\r", input.command, input.data);
Resulting in a Frame like this:
:\COMMANDFRAME:3;1.42\r\r
When the bytestream arrives at one microcontroller it is written into a simple ring buffer until it is processed.
Now my two questions are, firstly, what is the best way to identify a frame in a bytestream, i.e. everything between ":\" and "\r\r", and secondly how to parse it efficiently back into the struct — some combination of strtok (";" and ":"), and atoi/atof?

If you do not process the type of frame until you've read the double carriage return (CR) at the end and you then insert a null byte after the double CR, then you can extract the frame type information using sscanf() or other parsing techniques.
char frame_type[32];
char colon[2];
int offset;
if (sscanf(ring_buffer, ":\\%31[^:]%[:]%n", frame_type, colon, &offset) != 2)
…deal with malformatted frame…
Now you have the frame type in frame_type as a string. The slightly odd %[:] only matches a colon; what's crucial, though, is that it gets counted as a successful conversion, and the %n conversion will be valid. If it was changed to:
if (sscanf(":\\%31[^:]:%n", frame_type, &offset) != 1)
…deal with malformatted frame…
you would not know whether the trailing colon was matched or not, and you wouldn't know whether offset held a valid value (the offset into the string where the preceding character, the colon, was found).
You have at least two frame types; how many are there in total (at the moment, and could there be in the future)? At one level, it doesn't matter. You could use a series of string comparisons against the possible frame types, or you could use a hash of the frame type string and compare that against the set of valid hash values, or you could devise another mechanism.
Once you know which frame type you have, you know what format string to use to read the rest of the data. Since you read up to a double CR, you know that the ending contains that — you don't need to validate it again.
For example, for the sensor frame, you might use:
if (sscanf(ring_buffer + offset, "%.2f;%.2f;%.2f;%d;%d;%u",
&input.temperature, &input.current, &input.voltage,
&input.dutycycle, &input.lightsensor, &input.message) != 6)
…deal with malformatted sensor frame…
Or, for the command frame, you might use:
if (sscanf(ring_buffer + offset, "%u;%.5f\r\r", &input.command, &input.data) != 2)
…deal with malformatted command frame…
The only complicating factor is that you're using a ring buffer. That could mean that your sensor frame is split so that the first 5 bytes are at the end of the ring buffer and the remainder at the start of the buffer. Frankly, if you can afford the space and copying, converting the ring buffer to a regular (linear?) buffer will be easiest. If that is absolutely not an option, then you are probably stuck with not using sscanf() at all; you will either need to write your own variant of sscanf() that can be told about the shape of the ring buffer and work with that, or you will have to work character at a time.
Perhaps your custom function is:
int rbscanf(const char *rb_base, int rb_len, int rb_off, const char *format, ...);
The ring buffer starts at rb_base and is rb_len bytes long in total; the data starts at &rb_base[rb_off]. You might need to specify the buffer length and the data length (rb_len and rb_nbytes). You might already have a structure describing the ring buffer, in which case, you could pass (a pointer to) that to the function.
Alternatively, if you process the data before you've read the entire frame, then you can validate as you read the bytes. You'll still need to accumulate strings and numbers for conversion. You'll probably use strtol() and strtod() rather than atoi() and atof(); you will need to know about errors, including trailing unconverted characters, which the atoi() and atof() functions cannot tell you about. Care is required with the strtoX() functions, but they are effective.

Related

How to code ASCII Text Based protocol over RS-232 in C

I have to implement a relatively simple communication protocol on top of RS-232.
It's an ASCII based text protocol with a couple of frame types.
Each frame looks something like this:
* ___________________________________
* | | | | |
* | SOH | Data | CRC-16 | EOT |
* |_____|_________|_________|________|
* 1B nBytes 2B 1B
Start Of Header (1 Byte)
Data (n-Bytes)
CRC-16 (2 Bytes)
EOT (End Of Transmission)
Each data-field needs to be separated by semicolon ";":
for example, for HEADER type data (contains code,ver,time,date,src,id1,id2 values):
{code};{ver};{time};{date};{src};{id1};{id2}
what is the most elegant way of implementing this in C is my question?
I have tried defining multiple structs for each type of frame, for example:
typedef struct {
uint8_t soh;
char code;
char ver;
Time_t time;
Date_t date;
char src; // Unsigned char
char id1[20]; // STRING_20
char id2[20]; // STRING_20
char crlf;
uint16_t crc;
uint8_t eot;
} stdHeader_t;
I have declared a global buffer:
uint8_t DATA_BUFF[BUFF_SIZE];
I then have a function sendHeader() in which I want to use RS-232 send function to send everything byte by byte by casting the dataBuffer to header struct and filling out the struct:
static enum_status sendHeader(handle_t *handle)
{
uint16_t len;
enum_RETURN_VALUE rs232_err = OK;
enum_status err = STATUS_OK;
stdHeader_t *header = (stdHeader_t *)DATA_BUFF;
memset(DATA_BUFF, 0, size);
header ->soh= SOH,
header ->code= HEADER,
header ->ver= 10, // TODO
header ->time= handle->time,
header ->date= handle->date,
header ->src= handle->config->source,
memset(header ->id1,handle->config->id1, strlen(handle->config->id1));
memset(header ->id2,handle->config->id2, strlen(handle->config->id1));
header ->crlf = '\r\n',
header ->crc = calcCRC();
header ->eot = EOT;
len = sizeof(stdHeader_t );
do
{
for (uint16_t i = 0; i < len; i++)
{
rs232_err= rs232_tx_send(DATA_BUFF[i], 1); // Send one byte
if (rs232_err!= OK)
{
err = STATUS_ERR;
break;
}
}
// Break do-while loop if there is an error
if (err == STATUS_ERR)
{
break;
}
} while (conditions);
return err;
}
My problem is that I do not know how to approach the problem of handling ascii text based protocol,
the above principle would work very well for byte based protocols.
Also, I do not know how to implement semicolon ";" seperation of data in the above snippet, as everything is sent byte by byte, I would need aditional logic to know when it is needed to send ";" and with current implementation, that would not look very good.
For fields id1 and id2, I am receiveing string values as a part of handle->config, they can be of any lenght, but max is 20. Because of that, with current implementation, I would be sending more than needed in case actual lenght is less than 20, but I cannot use pointers to char inside the struct, because in that case, only the pointer value would get sent.
So to sumarize, the main question is:
How to implement the above described text based protocol for rs-232 in a nice and proper way?
what is the most elegant way of implementing this (ASCII Text Based protocol) in C is my question?
Since this is ASCII, avoid endian issues of trying to map a multi-byte integer. Simply send an integer (including char) as decimal text. Likewise for floating point, use exponential notation and sufficient precision. E.g. sprintf(buf, "%.*e", DBL_DECIMAL_DIG-1, some_double);. Allow "%a" notation.
Do not use the same code for SOH and EOT. Different values reduce receiver confusion.
Send date and time using ISO 8601 as your guide. E.g. "2022-11-10", "23:38:42".
Send string with a leading/trailing ". Escape non-printable ASCII characters, and ", \, ;. Example for 10 long string 123\\;\"\xFF456 --> "123\\\;\"\xFF456".
Error check, like crazy, the received data. Reject packets of data for all sorts of reasons: field count wrong, string too long, value outside field range, bad CRC, timeout, any non-ASCII character received.
Use ASCII hex characters for CRC: 4 hex characters instead of 2 bytes.
Consider a CRC 32 or 64.
Any out-of-band input, (bytes before receiving a SOF) are silently dropped. This nicely allows an optional LF after each command.
Consider the only characters between SOH/EOT should be printable ASCII: 32-126. Escape others as needed.
Since "it's an ASCII based text protocol with a couple of frame types.", I'd expect a type field.
See What type of framing to use in serial communication for more ideas.
First of all, structs are really not good for representing data protocols. The struct in your example will be filled to the brim with padding bytes everywhere, so it is not a proper nor portable representation of the protocol. In particular, forget all about casting a struct to/from a raw uint8_t array - that's problematic for even more reasons: the first address alignment and pointer aliasing.
In case you insist on using a struct, you must write serialization/deserialization routines that manually copy to/from each member into the raw uint8_t buffer, which is the one that must be used for the actual transmission.
(De)serialization routines might not be such a bad idea anyway, because of another issue not addressed by your post: network endianess. RS-232 protocols are by tradition almost always Big Endian, but don't count on it - endianess must be documented explicitly.
My problem is that I do not know how to approach the problem of handling ascii text based protocol, the above principle would work very well for byte based protocols.
That is a minor problem compared to the above. Often it is acceptable to have a mix of raw data (essentially everything but the data payload) and ASCII text. If you want a pure ASCII protocol you could consider something like "AT commands", but they don't have much in the way of error handling. You really should have a CRC16 as well as sync bytes. Hint: preferably pick the first sync byte as something that don't match 7 bit ASCII. That is something with MSB set. 0xAA is popular.
Once you've sorted out data serialization, endianess and protocol structure, you can start to worry about details such as string handling in the payload part.
And finally, RS232 is dinosaur stuff. There's not many reasons why one shouldn't use RS422/RS485. The last argument for using RS232, "computers come with RS232 COM ports", went obsolete some 15-20 years back.
One thing your struct implementation is missing is packing. For efficiency reasons, depending on which processor your code is running on, the compiler will add padding to the structure to align on certain byte boundaries. Normally this doesn't effect you code that much, but if you are sending this data across a serial stream where every byte matters, then you will be sending random zeros across as well.
This article explains padding well, and how to pack your structures for use cases like yours
Structure Padding

Dynamic memory allocation for input? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am having a lot of trouble starting my project. Here are the directions:
"Complete counts.c as follows:
Read characters from standard input until EOF (the end-of-file mark) is read. Do not prompt the user to enter text - just read data as soon as the program starts.
Keep a running count of each different character encountered in the input, and keep count of the total number of characters input (excluding EOF)."
The format my professor gave me to start is: `
#include <stdio.h>
int main(int argc, char *argv[]) {
return 0;
}
In addition to how to start the problem, I'm also confused as to why the two parameter's are given in the main function when nothing is going to be passed to it. Help would be much appretiated! Thank you!
`
Slightly tricky to see what you're having trouble with here. The title doesn't form a complete question, nor is there one in the body; and they seem to be hinting at entirely different questions.
The assignment tells you to read characters - not store them. You could have a loop that only reads them one at a time if you wish (for instance, using getchar). You're also asked to report counts of each character, which would make sense to store in an array. Given that this is of "each different character", the simplest way would be to size the array for all possible characters (limits.h defines UCHAR_MAX, which would help with this). Remember to initialize the array if it's automatically allocated (the default for function local variables).
Regarding the arguments to main, this program does not need them, and the C standard does allow you to leave them out. They're likely included as this is a template of a basic C program, to make it usable if command line arguments will be used also.
For more reference code you might want to compare the word count utility (wc); the character counting you want is the basis of a frequency analysis or histogram.
This should give you a start to investigate what you need to learn to complete your task,
Initially declare a character input buffer of sufficient size to read chars as,
char input[SIZE];
Use fgets() to read the characters from stdin as,
if (fgets(input, sizeof input, stdin) == NULL) {
; // handle EOF
}
Now input array has your string of characters which you to find occurrence of characters. I did not understand When you say different characters to count, however you have an array to traverse it completely to count the characters you need.
Firstly, luckily for you we will not need dynamic memory allocation at all here as we are not asked to store the input strings, instead we simply need to record how many of each ascii code is input during program run, as there a constant and finite number of those we can simply store them in a fixed size array.
The functions we are looking at here (assuming we are using standard libs) are as follows:
getchar, to read chars from standard input
printf, to print the outputs back to stdout
The constructs we will need are:
do {} while, to loop around until a condition is false
The rest just needs simple mathematical operators, here is a short example which basically shows a sample solution:
#include <stdio.h>
int main(int argc, char *argv[])
{
/* Create an array with entries for each char,
* then init it to zeros */
int AsciiCounts[256] = {0};
int ReadChar;
int TotalChars = 0;
int Iterator = 0;
do
{
/* Read a char from stdin */
ReadChar = getchar();
/* Increment the entry for its code in the array */
AsciiCounts[ReadChar]++;
TotalChars++;
} while (ReadChar != EOF);
/* Stop if we read an EOF */
do
{
/* Print each char code and how many times it occurred */
printf("Char code %#x occurred %d times\n", Iterator, AsciiCounts[Iterator]);
Iterator++;
} while (Iterator <= 255);
/* Print the total length read in */
printf("Total chars read (excluding EOF): %d", --TotalChars);
return 0;
}
Which should achieve the basic goal, however a couple of extension exercises which would likely benefit your understanding of C. First you could try to convert the second do while loop to a for loop, which is more appropriate for the situation but I did not use for simplicity's sake. Second you could add a condition so the output phase skips codes which never occurred. Finally it could be interesting to check which chars are printable and print their value instead of their hex code.
On the second part of the question, the reason those arguments are passed to main even though they are ignored is due to the standard calling convention of c programs under most OSes, they pass the number of command line arguments and values of each command line argument respectively in case the program wishes to check them. However if you really will not use them you can in most compilers just use main() instead however this makes things more difficult later if you choose to add command line options and has no performance benefit.

Why would one add 1 or 2 to the second argument of snprintf?

What is the role of 1 and 2 in these snprintf functions? Could anyone please explain it
snprintf(argv[arg++], strlen(pbase) + 2 + strlen("ivlpp"), "%s%ccivlpp", pbase, sep);
snprintf(argv[arg++], strlen(defines_path) + 1, "-F\"%s\"", defines_path);
The role of the +2 is to allow for a terminal null and the embedded character from the %c format, so there is exactly the right amount of space for formatting the first string. but (as 6502 points out), the actual string provided is one space shorter than needed because the strlen("ivlpp") doesn't match the civlpp in the format itself. This means that the last character (the second 'p') will be truncated in the output.
The role of the +1 is also to cause snprintf() to truncate the formatted data. The format string contains 4 literal characters, and you need to allow for the terminal null, so the code should allocate strlen(defines)+5. As it is, the snprintf() truncates the data, leaving off 4 characters.
I'm dubious about whether the code really works reliably...the memory allocation is not shown, but will have to be quite complex - or it will have to over-allocate to ensure that there is no danger of buffer overflow.
Since a comment from the OP says:
I don't know the use of snprintf()
int snprintf(char *restrict s, size_t n, const char *restrict format, ...);
The snprintf() function formats data like printf(), but it writes it to a string (the s in the name) instead of to a file. The first n in the name indicates that the function is told exactly how long the string is, and snprintf() therefore ensures that the output data is null terminated (unless the length is 0). It reports how long the string should have been; if the reported value is longer than the value provided, you know the data got truncated.
So, overall, snprintf() is a relatively safe way of formatting strings, provided you use it correctly. The examples in the question do not demonstrate 'using it correctly'.
One gotcha: if you work on MS Windows, be aware that the MSVC implementation of snprintf() does not exactly follow the C99 standard (and it looks a bit as though MS no longer provides snprintf() at all; only various alternatives such as _snprintf()). I forget the exact deviation, but I think it means that the string is not properly null-terminated in all circumstances when it should be longer than the space provided.
With locally defined arrays, you normally use:
nbytes = snprintf(buffer, sizeof(buffer), "format...", ...);
With dynamically allocated memory, you normally use:
nbytes = snprintf(dynbuffer, dynbuffsize, "format...", ...);
In both cases, you check whether nbytes contains a non-negative value less than the size argument; if it does, your data is OK; if the value is equal to or larger, then your data got chopped (and you know how much space you needed to allocate).
The C99 standard says:
The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a negative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
The programmer whose code you are reading doesn't know how to use snprintf properly. The second argument is the buffer size, so it should almost always look like this:
snprintf(buf, sizeof buf, "..." ...);
The above is for situations where buf is an array, not a pointer. In the latter case you have to pass the buffer size along:
snprintf(buf, bufsize, "...", ...);
Computing the buffer size is unneeded.
By the way, since you tagged the question as qt-related. There is a very nice QString class that you should use instead.
At a first look both seem incorrect.
In the first case the correct computation would be path + sep + name + NUL so 2 would seem ok, but for the name the strlen call is using ilvpp while the formatting code is using instead cilvpp that is one char longer.
In the second case the number of chars added is 4 (-L"") so the number to add should be 5 because of the ending NUL.

K&R 3-2 overflow problem

I'm going through K&R and 3-2 looks like it would be easy to get into a buffer overflow
Write a function escape(s,t) that converts characters like newline and tab into visible escape sequences like \n and \t as it copies the string t to s. Use a switch
If I replace the byte '\n' with '\' and 'n', the size of s could potentially be quite a bit bigger than the source string.
I could just write this program and ignore the overflow but I would rather not.
I'm having issue wrapping my head around how to handle this?
I'm thinking having a fixed buffer size, perhaps something out of limit.h, and flushing the buffer to stdio when it gets full?
I believe the entire point of the exercise is to teach you that when you're dealing with something like this you either need to:
Shoot too high (make a buffer double the size of the original)
Take extra time (an extra pass) and pre-compute the required size of the buffer.
s will never be longer than twice the length of t. Since this is an exercise apparently meant to help you learn to use switch, I think it'd be fine to assume that the caller passes a string in s that's of sufficient length. Or, if s is of type char** (or similar), then you're meant to allocate the string, in which case you can allocate a string of the proper size.
In a real-world function, you'd probably have another parameter that indicates the maximum length for the destination string.
try adding a size parameter, so you know the size of the target buffer. If you pass a pointer to that parameter, you can return some kind of error value if the buffer is too small and pass the needed size back through the size param. Something like:
int escape(size_t *size, char *out, const char *in);

Resuming [vf]?nprintf after reaching the limit

I have an application which prints strings to a buffer using snprintf and vsnprintf. Currently, if it detects an overflow, it appends a > to the end of the string as a sign that the string was chopped and prints a warning to stderr. I'm trying to find a way to have it resume the string [from where it left off] in another buffer.
If this was using strncpy, it would be easy; I know how many bytes were written, and so I can start the next print from *(p+bytes_written); However, with printf, I have two problems; first, the formatting specifiers may take up more or less space in the final string as in the format string, and secondly, my valist may be partially parsed.
Does anyone have an easy-ish solution to this?
EDIT: I should probably clarify that I'm working on an embedded system with limited memory + no dynamic allocation [i.e., I don't want to use dynamic allocation]. I can print messages of 255 bytes, but no more, although I can print as many of those as I want. I don't, however, have the memory to allocate lots of memory on the stack, and my print function needs to be thread-safe, so I can't allocate just one global / static array.
I don't think you can do what you're looking for (other than by the straightforward way of reallocating the buffer to the necessary size and performing the entire operation again).
The reasons you listed are a couple contributors to this, but the real killer is that the formatter might have been in the middle of formatting an argument when it ran out of space, and there's no reasonable way to restart that.
For example, say there's 3 bytes left in the buffer, and the formatter starts working on a "%d" conversion for the value -1234567. It ll put "-1\0" into the buffer then do whatever else it needs to do to return the size of buffer you really need.
In addition to you being able to determine which specifier the formatter was working on, you'd need to be able to figure out that instead of passing in -1234567 on the second round you need to pass in 234567. I defy you to come up with a reasonable way to do that.
Now if there's a real reason you don't want to restart the operation from the top, you probably could wrap the snprintf()/vsnprintf() call with something that breaks down the format string, sending only a single conversion specifier at a time and concatenating that result to the output buffer. You'd have to come up with some way for the wrapper to keep some state across retries so it knows which conversion spec to pick up from.
So maybe it's doable in a sense, but it sure seems like it would be an awful lot of work to avoid the much simpler 'full retry' scheme. I could see maybe (maybe) trying this on a system where you don't have the luxury of dynamically allocating a larger buffer (an embedded system, maybe). In that case, I'd probably argue that what's needed is a much simpler/restricted scope formatter that doesn't have all the flexibility of printf() formatters and can handle retrying (because their scope is more limited).
But, man, I would try very hard to talk some sense into whoever said it was a requirement.
Edit:
Actually, I take some of that back. If you're willing to use a customized version of snprintf() (let's call it snprintf_ex()) I could see this being a relatively simple operation:
int snprintf_ex( char* s, size_t n, size_t skipChars, const char* fmt, ...);
snprintf_ex() (and its companion functions such as vsnprintf()) will format the string into the provided buffer (as usual) but will skip outputting the first skipChars characters.
You could probably rig this up pretty easy using the source from your compiler's library (or using something like Holger Weiss' snprintf()) as a starting point. Using this might look something like:
int bufSize = sizeof(buf);
char* fmt = "some complex format string...";
int needed = snprintf_ex( buf, bufSize, 0, fmt, arg1, arg2, etc, etc2);
if (needed >= bufSize) {
// dang truncation...
// do whatever you want with the truncated bits (send to a logger or whatever)
// format the rest of the string, skipping the bits we already got
needed = snprintf_ex( buf, bufSize, bufSize - 1, fmt, arg1, arg2, etc, etc2);
// now the buffer contains the part that was truncated before. Note that
// you'd still need to deal with the possibility that this is truncated yet
// again - that's an exercise for the reader, and it's probably trickier to
// deal with properly than it might sound...
}
One drawback (that might or might not be acceptable) is that the formatter will do all the formatting work over again from the start - it'll just throw away the first skipChars characters that it comes up with. If I had to use something like this, I'd think that would almost certainly be an acceptable thing (it what happens when someone deals with truncation using the standard snprintf() family of functions).
The C99 functions snprintf() and vsnprintf() both return the number of characters needed to print the whole format string with all the arguments.
If your implementation conforms to C99, you can create an array large enough for your output strings then deal with them as needed.
int chars_needed = snprintf(NULL, 0, fmt_string, v1, v2, v3, ...);
char *buf = malloc(chars_needed + 1);
if (buf) {
snprintf(buf, chars_needed + 1, fmt_string, v1, v2, v3, ...);
/* use buf */
free(buf);
} else {
/* no memory */
}
If you're on a POSIX-ish system (which I'm guessing you may be since you mentioned threads), one nice solution would be:
First try printing the string to a single buffer with snprintf. If it doesn't overflow, you've saved yourself a lot of work.
If that doesn't work, create a new thread and a pipe (with the pipe() function), fdopen the writing end of the pipe, and use vfprintf to write the string. Have the new thread read from the reading end of the pipe and break the output string into 255-byte messages. Close the pipe and join with the thread after vfprintf returns.

Resources