How to code ASCII Text Based protocol over RS-232 in C - c

I have to implement a relatively simple communication protocol on top of RS-232.
It's an ASCII based text protocol with a couple of frame types.
Each frame looks something like this:
* ___________________________________
* | | | | |
* | SOH | Data | CRC-16 | EOT |
* |_____|_________|_________|________|
* 1B nBytes 2B 1B
Start Of Header (1 Byte)
Data (n-Bytes)
CRC-16 (2 Bytes)
EOT (End Of Transmission)
Each data-field needs to be separated by semicolon ";":
for example, for HEADER type data (contains code,ver,time,date,src,id1,id2 values):
{code};{ver};{time};{date};{src};{id1};{id2}
what is the most elegant way of implementing this in C is my question?
I have tried defining multiple structs for each type of frame, for example:
typedef struct {
uint8_t soh;
char code;
char ver;
Time_t time;
Date_t date;
char src; // Unsigned char
char id1[20]; // STRING_20
char id2[20]; // STRING_20
char crlf;
uint16_t crc;
uint8_t eot;
} stdHeader_t;
I have declared a global buffer:
uint8_t DATA_BUFF[BUFF_SIZE];
I then have a function sendHeader() in which I want to use RS-232 send function to send everything byte by byte by casting the dataBuffer to header struct and filling out the struct:
static enum_status sendHeader(handle_t *handle)
{
uint16_t len;
enum_RETURN_VALUE rs232_err = OK;
enum_status err = STATUS_OK;
stdHeader_t *header = (stdHeader_t *)DATA_BUFF;
memset(DATA_BUFF, 0, size);
header ->soh= SOH,
header ->code= HEADER,
header ->ver= 10, // TODO
header ->time= handle->time,
header ->date= handle->date,
header ->src= handle->config->source,
memset(header ->id1,handle->config->id1, strlen(handle->config->id1));
memset(header ->id2,handle->config->id2, strlen(handle->config->id1));
header ->crlf = '\r\n',
header ->crc = calcCRC();
header ->eot = EOT;
len = sizeof(stdHeader_t );
do
{
for (uint16_t i = 0; i < len; i++)
{
rs232_err= rs232_tx_send(DATA_BUFF[i], 1); // Send one byte
if (rs232_err!= OK)
{
err = STATUS_ERR;
break;
}
}
// Break do-while loop if there is an error
if (err == STATUS_ERR)
{
break;
}
} while (conditions);
return err;
}
My problem is that I do not know how to approach the problem of handling ascii text based protocol,
the above principle would work very well for byte based protocols.
Also, I do not know how to implement semicolon ";" seperation of data in the above snippet, as everything is sent byte by byte, I would need aditional logic to know when it is needed to send ";" and with current implementation, that would not look very good.
For fields id1 and id2, I am receiveing string values as a part of handle->config, they can be of any lenght, but max is 20. Because of that, with current implementation, I would be sending more than needed in case actual lenght is less than 20, but I cannot use pointers to char inside the struct, because in that case, only the pointer value would get sent.
So to sumarize, the main question is:
How to implement the above described text based protocol for rs-232 in a nice and proper way?

what is the most elegant way of implementing this (ASCII Text Based protocol) in C is my question?
Since this is ASCII, avoid endian issues of trying to map a multi-byte integer. Simply send an integer (including char) as decimal text. Likewise for floating point, use exponential notation and sufficient precision. E.g. sprintf(buf, "%.*e", DBL_DECIMAL_DIG-1, some_double);. Allow "%a" notation.
Do not use the same code for SOH and EOT. Different values reduce receiver confusion.
Send date and time using ISO 8601 as your guide. E.g. "2022-11-10", "23:38:42".
Send string with a leading/trailing ". Escape non-printable ASCII characters, and ", \, ;. Example for 10 long string 123\\;\"\xFF456 --> "123\\\;\"\xFF456".
Error check, like crazy, the received data. Reject packets of data for all sorts of reasons: field count wrong, string too long, value outside field range, bad CRC, timeout, any non-ASCII character received.
Use ASCII hex characters for CRC: 4 hex characters instead of 2 bytes.
Consider a CRC 32 or 64.
Any out-of-band input, (bytes before receiving a SOF) are silently dropped. This nicely allows an optional LF after each command.
Consider the only characters between SOH/EOT should be printable ASCII: 32-126. Escape others as needed.
Since "it's an ASCII based text protocol with a couple of frame types.", I'd expect a type field.
See What type of framing to use in serial communication for more ideas.

First of all, structs are really not good for representing data protocols. The struct in your example will be filled to the brim with padding bytes everywhere, so it is not a proper nor portable representation of the protocol. In particular, forget all about casting a struct to/from a raw uint8_t array - that's problematic for even more reasons: the first address alignment and pointer aliasing.
In case you insist on using a struct, you must write serialization/deserialization routines that manually copy to/from each member into the raw uint8_t buffer, which is the one that must be used for the actual transmission.
(De)serialization routines might not be such a bad idea anyway, because of another issue not addressed by your post: network endianess. RS-232 protocols are by tradition almost always Big Endian, but don't count on it - endianess must be documented explicitly.
My problem is that I do not know how to approach the problem of handling ascii text based protocol, the above principle would work very well for byte based protocols.
That is a minor problem compared to the above. Often it is acceptable to have a mix of raw data (essentially everything but the data payload) and ASCII text. If you want a pure ASCII protocol you could consider something like "AT commands", but they don't have much in the way of error handling. You really should have a CRC16 as well as sync bytes. Hint: preferably pick the first sync byte as something that don't match 7 bit ASCII. That is something with MSB set. 0xAA is popular.
Once you've sorted out data serialization, endianess and protocol structure, you can start to worry about details such as string handling in the payload part.
And finally, RS232 is dinosaur stuff. There's not many reasons why one shouldn't use RS422/RS485. The last argument for using RS232, "computers come with RS232 COM ports", went obsolete some 15-20 years back.

One thing your struct implementation is missing is packing. For efficiency reasons, depending on which processor your code is running on, the compiler will add padding to the structure to align on certain byte boundaries. Normally this doesn't effect you code that much, but if you are sending this data across a serial stream where every byte matters, then you will be sending random zeros across as well.
This article explains padding well, and how to pack your structures for use cases like yours
Structure Padding

Related

Custom regional language

I am writing code in C using 8051 MC at89c51 family to display a regional language in 16x2 lcd displayer.
Because the lcd doesn't read regional languages by default, I create the custom character and I converted each letter into hex. But what I don't understand is where I can put the converted hex value in my code and display as I want?
void main()
{
...
str_lcd("HELLO & WELCOME");
delay_ms(3000);
cmd_lcd(0x80);
cmd_lcd(0x01);
...
}
for "HELLO & WELCOME" the hex value is...
{0x40,0x60,0x30,0x1c,0x14,0x14,0x14,0x14},
{0x78,0x08,0x10,0x20,0x18,0x08,0x08,0x08},
{0x20,0x40,0x7c,0x24,0x24,0x04,0x0a,0x11},
{0x78,0x08,0x10,0x20,0x18,0x08,0x08,0x08},
{0x38,0x28,0x38,0x10,0x38,0x28,0x28,0x28},
{0x44,0x44,0x64,0x24,0x24,0x24,0x24,0x3c},
{0x3c,0x40,0x40,0x20,0x18,0x08,0x08,0x08},
{0x00,0x7f,0x55,0x55,0x55,0x55,0x77,0x00},
{0x7c,0x54,0x54,0x54,0x04,0x04,0x04,0x04},
{0x7c,0x10,0x1c,0x04,0x1f,0x04,0x04,0x04},
{0x48,0x48,0x48,0x4e,0x48,0x48,0x48,0x78},
};
so can any one help me where I can put this hex value and display it on the lcd?
Assuming each 8-byte array corresponds to a specific character, you could have a table of 128 such 8-byte arrays anywhere in the code, for example by having a static array of arrays of constant bytes, like
static const unsigned char character_data[128][8] = {
// Your data here, one entry per character
};
Most of the data in the table above would simply be zero.
Now where you put this table doesn't really matter, the compiler and linker will make sure it ends up in the correct segment (most likely in the text segment with the code). But since I declared it as static it should be placed in the source file which does the translation between the characters and the data sent to the LCD panel.

Sending † character instead of Space character in Char array

I've migrated my project from XE5 to 10 Seattle. I'm still using ANSII codes to communicate with devices. With my new build, Seattle IDE is sending † character instead of space char (which is #32 in Ansii code) in Char array. I need to send space character data to text file but I can't.
I tried #32 (like before I used), #032 and #127 but it doesn't work. Any idea?
Here is how I use:
fillChar(X,50,#32);
Method signature (var X; count:Integer; Value:Ordinal)
Despite its name, FillChar() fills bytes, not characters.
Char is an alias for WideChar (2 bytes) in Delphi 2009+, in prior versions it is an alias for AnsiChar (1 byte) instead.
So, if you have a 50-element array of WideChar elements, the array is 100 bytes in size. When you call fillChar(X,50,#32), it fills in the first 50 bytes with a value of $20 each. Thus the first 25 WideChar elements will have a value of $2020 (aka Unicode codepoint U+2020 DAGGER, †) and the second 25 elements will not have any meaningful value.
This issue is explained in the FillChar() documentation:
Fills contiguous bytes with a specified value.
In Delphi, FillChar fills Count contiguous bytes (referenced by X) with the value specified by Value (Value can be of type Byte or AnsiChar)
Note that if X is a UnicodeString, this may not work as expected, because FillChar expects a byte count, which is not the same as the character count.
In addition, the filling character is a single-byte character. Therefore, when Buf is a UnicodeString, the code FillChar(Buf, Length(Buf), #9); fills Buf with the code point $0909, not $09. In such cases, you should use the StringOfChar routine.
This is also explained in Embarcadero's Unicode Migration Resources white papers, for instance on page 28 of Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines by Cary Jensen:
Actually, the complexity of this type of code is not related to pointers and buffers per se. The problem is due to Chars being used as pointers. So, now that the size of Strings and Chars in bytes has changed, one of the fundamental assumptions that much of this code embraces is no longer valid: That individual Chars are one byte in length.
Since this type of code is so problematic for Unicode conversion (and maintenance in general), and will require detailed examination, a good argument can be made for refactoring this code where possible. In short, remove the Char types from these operations, and switch to another, more appropriate data type. For example, Olaf Monien wrote, "I wouldn't recommend using byte oriented operations on Char (or String) types. If you need a byte-buffer, then use ‘Byte’ as [the] data type: buffer: array[0..255] of Byte;."
For example, in the past you might have done something like this:
var
Buffer: array[0..255] of AnsiChar;
begin
FillChar(Buffer, Length(Buffer), 0);
If you merely want to convert to Unicode, you might make the following changes:
var
Buffer: array[0..255] of Char;
begin
FillChar(Buffer, Length(buffer) * SizeOf(Char), 0);
On the other hand, a good argument could be made for dropping the use of an array of Char as your buffer, and switch to an array of Byte, as Olaf suggests. This may look like this (which is similar to the first segment, but not identical to the second, due to the size of the buffer):
var
Buffer: array[0..255] of Byte;
begin
FillChar(Buffer, Length(buffer), 0);
Better yet, use this second argument to FillChar which works regardless of the data type of the array:
var
Buffer: array[0..255] of Byte;
begin
FillChar(Buffer, Length(buffer) * SizeOf(Buffer[0]), 0);
The advantage of these last two examples is that you have what you really wanted in the first place, a buffer that can hold byte-sized values. (And Delphi will not try to apply any form of implicit string conversion since it's working with bytes and not code units.) And, if you want to do pointer math, you can use PByte. PByte is a pointer to a Byte.
The one place where changes like may not be possible is when you are interfacing with an external library that expects a pointer to a character or character array. In those cases, they really are asking for a buffer of characters, and these are normally AnsiChar types.
So, to address your issue, since you are interacting with an external device that expects Ansi data, you need to declare your array as using AnsiChar or Byte elements instead of (Wide)Char elements. Then your original FillChar() call will work correctly again.
If you want to use ANSI for communication with devices, you would define the array as
x: array[1..50] of AnsiChar;
In this case to fill it with space characters you use
FillChar(x, 50, #32);
Using an array of AnsiChar as communication buffer may become troublesome in a Unicode environment, so therefore I would suggest to use a byte array as communication buffer
x: array[1..50] of byte;
and intialize it with
FillChar(x, 50, 32);

Parse Bytestream in C (embedded programming) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
So I have a communication system between two microcontrollers and I'm sending data between them, i.e. sensor data from one µC to the other one, and commands back.
The sensor data is taken from a struct and put into a frame, which looks like this (the identifier "SENSORFRAME" is not constant; it depends on what's in the frame):
sprintf(message, "\:\\SENSORFRAME\:%.2f;%.2f;%.2f;%d;%d;%u\r\r",
input.temperature, input.current, input.voltage,
input.dutycycle,input.lightsensor, input.message);
Resulting in a Frame like this:
:\SENSORFRAME:-12.40;1.42;0.53;500;1200;8\r\r
Or for the commandframes:
sprintf(message, "\:\\COMMANDFRAME\:%u;%.5f\r\r", input.command, input.data);
Resulting in a Frame like this:
:\COMMANDFRAME:3;1.42\r\r
When the bytestream arrives at one microcontroller it is written into a simple ring buffer until it is processed.
Now my two questions are, firstly, what is the best way to identify a frame in a bytestream, i.e. everything between ":\" and "\r\r", and secondly how to parse it efficiently back into the struct — some combination of strtok (";" and ":"), and atoi/atof?
If you do not process the type of frame until you've read the double carriage return (CR) at the end and you then insert a null byte after the double CR, then you can extract the frame type information using sscanf() or other parsing techniques.
char frame_type[32];
char colon[2];
int offset;
if (sscanf(ring_buffer, ":\\%31[^:]%[:]%n", frame_type, colon, &offset) != 2)
…deal with malformatted frame…
Now you have the frame type in frame_type as a string. The slightly odd %[:] only matches a colon; what's crucial, though, is that it gets counted as a successful conversion, and the %n conversion will be valid. If it was changed to:
if (sscanf(":\\%31[^:]:%n", frame_type, &offset) != 1)
…deal with malformatted frame…
you would not know whether the trailing colon was matched or not, and you wouldn't know whether offset held a valid value (the offset into the string where the preceding character, the colon, was found).
You have at least two frame types; how many are there in total (at the moment, and could there be in the future)? At one level, it doesn't matter. You could use a series of string comparisons against the possible frame types, or you could use a hash of the frame type string and compare that against the set of valid hash values, or you could devise another mechanism.
Once you know which frame type you have, you know what format string to use to read the rest of the data. Since you read up to a double CR, you know that the ending contains that — you don't need to validate it again.
For example, for the sensor frame, you might use:
if (sscanf(ring_buffer + offset, "%.2f;%.2f;%.2f;%d;%d;%u",
&input.temperature, &input.current, &input.voltage,
&input.dutycycle, &input.lightsensor, &input.message) != 6)
…deal with malformatted sensor frame…
Or, for the command frame, you might use:
if (sscanf(ring_buffer + offset, "%u;%.5f\r\r", &input.command, &input.data) != 2)
…deal with malformatted command frame…
The only complicating factor is that you're using a ring buffer. That could mean that your sensor frame is split so that the first 5 bytes are at the end of the ring buffer and the remainder at the start of the buffer. Frankly, if you can afford the space and copying, converting the ring buffer to a regular (linear?) buffer will be easiest. If that is absolutely not an option, then you are probably stuck with not using sscanf() at all; you will either need to write your own variant of sscanf() that can be told about the shape of the ring buffer and work with that, or you will have to work character at a time.
Perhaps your custom function is:
int rbscanf(const char *rb_base, int rb_len, int rb_off, const char *format, ...);
The ring buffer starts at rb_base and is rb_len bytes long in total; the data starts at &rb_base[rb_off]. You might need to specify the buffer length and the data length (rb_len and rb_nbytes). You might already have a structure describing the ring buffer, in which case, you could pass (a pointer to) that to the function.
Alternatively, if you process the data before you've read the entire frame, then you can validate as you read the bytes. You'll still need to accumulate strings and numbers for conversion. You'll probably use strtol() and strtod() rather than atoi() and atof(); you will need to know about errors, including trailing unconverted characters, which the atoi() and atof() functions cannot tell you about. Care is required with the strtoX() functions, but they are effective.

Understanding second argument in the function "xdrmem_create"

I was debugging ganglia and came through sending data using XDR through UDP channel.
I found the second argument of the function xdrmem_create ( xdrs, addr, size, op) strange.
Where the syntax of the arguments are given as:
XDR *xdrs;
char *addr;
u_int size;
enum xdr_op op;
The reference of this function is here.
As you can see, the second argument (xdrs) of this function is a character array. And this is similarly declared in one of the ganglia's function as char msgbuf[GANGLIA_MAX_MESSAGE_LEN];.
After calling above function as xdrmem_create(&x, msgbuf, GANGLIA_MAX_MESSAGE_LEN, XDR_ENCODE); in ganglia , appropriate data in the ganglia's specific structure (cb->msg) is encoded to XDR format by calling the function xdr_Ganglia_value_msg(&x, &(cb->msg)) where x is an XDR type of variable.
Later, to send the encoded data through UDP channel, the function Ganglia_udp_send_message( udp_send_channels, msgbuf, len); is called.
To understand how this XDR data is sent, I tried to print the output of the content of msgbuf using fprintf but it always print nothing in spite of the fact that it is a character array . And it is also evident that the encoded data is sent successfully.
So, my question is, How data encoded into XDR format is being sent through UDP channel here?
I have pasted a part of code from ganglia here. You can see from the line 131 to 136 for your reference.
The messages are encoded with XDR, a binary format. The XDR libraries used are old, and a modern version of the API probably would have used uint8_t instead of char. Despite it using char, it is binary data - i.e. you cannot print the data as a string.
If you want to print this data, use a loop where you print each byte as hex e.g. by doing printf("%02X ", msgbuf[i]);
Read RFC4506 to learn about the XDR encoding. The actual messages are described in an XDR language specification, and run through a code generation tool (e.g. rpcgen for C code) to generate code for encoding/decoding. See https://github.com/fastly/ganglia/blob/master/lib/gm_protocol.x for the message definitions that Ganglia defines.

How do I send an array of integers over TCP in C?

I'm lead to believe that write() can only send data buffers of byte (i.e. signed char), so how do I send an array of long integers using the C write() function in the sys/socket.h library?
Obviously I can't just cast or convert long to char, as any numbers over 127 would be malformed.
I took a look at the question, how to decompose integer array to a byte array (pixel codings), but couldn't understand it - please could someone dumb it down a little if this is what I'm looking for?
Follow up question:
Why do I get weird results when reading an array of integers from a TCP socket?
the prototype for write is:
ssize_t write(int fd, const void *buf, size_t count);
so while it writes in units of bytes, it can take a pointer of any type. Passing an int* will be no problem at all.
EDIT:
I would however, recomend that you also send the amount of integers you plan to send first so the reciever knows how much to read. Something like this (error checking omitted for brevity):
int x[10] = { ... };
int count = 10;
write(sock, &count, sizeof(count));
write(sock, x, sizeof(x));
NOTE: if the array is from dynamic memory (like you malloced it), you cannot use sizeof on it. In this case count would be equal to: sizeof(int) * element_count
EDIT:
As Brian Mitchell noted, you will likely need to be careful of endian issues as well. This is the case when sending any multibyte value (as in the count I recommended as well as each element of the array). This is done with the: htons/htonl and ntohs/ntohl functions.
Write can do what you want it to, but there's some things to be aware of:
1: You may get a partial write that's not on an int boundary, so you have to be prepared to handle that situation
2: If the code needs to be portable, you should convert your array to a specific endianess, or encode the endianess in the message.
The simplest way to send a single int (assuming 4-byte ints) is :
int tmp = htonl(myInt);
write(socket, &tmp, 4);
where htonl is a function that converts the int to network byte order. (Similarly,. when you read from the socket, the function ntohl can be used to convert back to host byte order.)
For an array of ints, you would first want to send the count of array members as an int (in network byte order), then send the int values.
Yes, you can just cast a pointer to your buffer to a pointer to char, and call write() with that. Casting a pointer to a different type in C doesn't affect the contents of the memory being pointed to -- all it does is indicate the programmer's intention that the contents of memory at that address be interpreted in a different way.
Just make sure that you supply write() with the correct size in bytes of your array -- that would be the number of elements times sizeof (long) in your case.
It would be better to have serialize/de-serialize functionality in your client /server program.
Whenever you want to send data, serialize the data into a byte buffer and send it over TCP with byte count.
When receiving data, de-serialize the data from buffer to your own interpretation .
You can interpret byte buffer in any form as you like. It can contain basic data type, objects etc.
Just make sure to take care of endianess and also alignment stuff.
Declare a character array. In each location of the array, store integer numbers, not characters.
Then you just send that.
For example:
char tcp[100];
tcp[0] = 0;
tcp[1] = 0xA;
tcp[2] = 0xB;
tcp[3] = 0xC;
.
.
// Send the character array
write(sock, tcp, sizeof(tcp));
I think what you need to come up with here is a protocol.
Suppose your integer array is:
100, 99, 98, 97
Instead of writing the ints directly to the buffer, I would "serialize" the array by turning it into a string representation. The string might be:
"100,99,98,97"
That's what would be sent over the wire. On the receiving end, you'd split the string by the commas and build the array back up.
This is more standardised, is human readable, and means people don't have to think about hi/lo byte orders and other silly things.
// Sarcasm
If you were working in .NET or Java, you'd probably encode it in XML, like this:
<ArrayOfInt><Int>100</Int><Int>99</Int><Int>98</Int><Int>97</Int></ArrayOfInt>
:)

Resources