I need help in sending a specific escape sequence, using Embedded C.
This is my very first topic at stackoverflow!
I use this function to write commands through UART:
void UART_Write(UARTChannel* channel, uint8_t* data, uint32_t length)
The inputs, channel, data and length, correspond with the UART channel, the command to be send and the length of the command, respectively.
This works great in general!
However, I have some difficulties in generating the correct escape sequence in C. I need to write the following escape sequence, using the UART_Write function:
EscR0,1,2,7;
Esc being the Escape character (0x1b), R0 being the character command designator, 1,2,7 being the context specific parameters and ; being the termination sign.
How can I make the input "data" to the function "UART_Write" be equal to the escape sequence EscR0,1,2,7; in Embedded C?
I suppose it can be done in many different ways, but any suggestions will do.
do you want to send 'esc' or do you want to send '1b' as a hex value?
just send the characters, just like you would any other character string
for instance for the example: EscR0,1,2,7;:
char buffer[20] = {'\0'};
buffer[0] = 0x1b;
strcat( buffer, "R0,1,2,6;" );
....
Related
I have a strange problem when using string function in C.
Currently I have a function that sends string to UART port.
When I give to it a string like
char buf[32];
strcpy(buf, "AT+CPMS=\"SM");
strcat(buf, "\"");
uart0_putstr(buf);
//or
uart0_putstr("AT+CPMS=SM"); //not a valid AT command, but without quotes just for test
it works well and sends string to UART. But when I use such call:
char buf[32];
strcpy(buf, "AT+CPMS=\"SM\"");
uart0_putstr(buf);
//or
uart0_putstr("AT+CPMS=\"SM\"");
it doesn't print to UART anything.
Maybe you can explain me what the difference between strings in first and second/third cases?
First the C language part:
String literals: All C string literals include an implicit null byte at the end; the C string literal "123" defines a 4 byte array with the values 49,50,51,0. The null byte is always there even if it is never mentioned and enables strlen, strcat etc. to find the end of the string. The suggestion strcpy(buf, "AT+CPMS=\"SM\"\0"); is nonsensical: The character array produced by "AT+CPMS=\"SM\"\0" now ends in two consecutive zero bytes; strcpy will stop at the first one already. "" is a 1 byte array whose single element has the value 0. There is no need to append another 0 byte.
strcat, strcpy: Both functions always add a null byte at the end of the string. There is no need to add a second one.
Escaping: As you know, a C string literal consists of characters book-ended by double quotes: "abc". This makes it impossible to have simple double quotes as part of the string because that would end the string. We have to "escape" them. The C language uses the backslash to give certain characters a special meaning, or, in this case, suppress the special meaning. The entire combination of backslash and subsequent source code character are transformed into a single character, or byte, in the compiled program. The combination \n is transformed into a single byte with the value 13 (usually interpreted as a newline by output devices), \r is 10 (usually carriage return), and \" is transformed into the byte 34, usually printed as the " glyph. The string Between the arrows is a double quote: ->"<- must be coded as "Between the arrows is a double quote: ->\"<-" in C. The middle double quote doesn't end the string literal because it is "escaped".
Then the UART part: The internet makes me believe that the command you want to send over the UART looks like AT+CPMS="SM", followed by a carriage return. The corresponding C string literal would be "AT+CPMS=\"SM\"\r".
The page I linked also inserts a delay between sending commands. Sending too quickly may cause errors that appear only sometimes.
The things to note are :
The AT command syntax probably demands that SM be surrounded by quotes on both sides.
Additionally, the protocol probably demands that a command end in a carriage return.
This ...
char buf[32];
strcpy(buf, "AT+CPMS=\"SM");
strcat(buf, "\"");
... produces the same contents in buf as this ...
char buf[32];
strcpy(buf, "AT+CPMS=\"SM\"");
... does, up to and including the string terminator at index 12. I fully expect an immediately following call to ...
uart0_putstr(buf);
... to have the same effect in each case unless uart0_putstr() looks at bytes past the terminator or its behavior is sensitive to factors other than its argument.
If it does look past the terminator, however, then that might explain not only a difference between those two, but also a difference with ...
uart0_putstr("AT+CPMS=\"SM\"");
... because in this last case, looking past the string terminator would overrun the bounds of the array, producing undefined behavior.
Thanks all. Finally It was resolved with adding NULL char to the end of string.
I have to implement a relatively simple communication protocol on top of RS-232.
It's an ASCII based text protocol with a couple of frame types.
Each frame looks something like this:
* ___________________________________
* | | | | |
* | SOH | Data | CRC-16 | EOT |
* |_____|_________|_________|________|
* 1B nBytes 2B 1B
Start Of Header (1 Byte)
Data (n-Bytes)
CRC-16 (2 Bytes)
EOT (End Of Transmission)
Each data-field needs to be separated by semicolon ";":
for example, for HEADER type data (contains code,ver,time,date,src,id1,id2 values):
{code};{ver};{time};{date};{src};{id1};{id2}
what is the most elegant way of implementing this in C is my question?
I have tried defining multiple structs for each type of frame, for example:
typedef struct {
uint8_t soh;
char code;
char ver;
Time_t time;
Date_t date;
char src; // Unsigned char
char id1[20]; // STRING_20
char id2[20]; // STRING_20
char crlf;
uint16_t crc;
uint8_t eot;
} stdHeader_t;
I have declared a global buffer:
uint8_t DATA_BUFF[BUFF_SIZE];
I then have a function sendHeader() in which I want to use RS-232 send function to send everything byte by byte by casting the dataBuffer to header struct and filling out the struct:
static enum_status sendHeader(handle_t *handle)
{
uint16_t len;
enum_RETURN_VALUE rs232_err = OK;
enum_status err = STATUS_OK;
stdHeader_t *header = (stdHeader_t *)DATA_BUFF;
memset(DATA_BUFF, 0, size);
header ->soh= SOH,
header ->code= HEADER,
header ->ver= 10, // TODO
header ->time= handle->time,
header ->date= handle->date,
header ->src= handle->config->source,
memset(header ->id1,handle->config->id1, strlen(handle->config->id1));
memset(header ->id2,handle->config->id2, strlen(handle->config->id1));
header ->crlf = '\r\n',
header ->crc = calcCRC();
header ->eot = EOT;
len = sizeof(stdHeader_t );
do
{
for (uint16_t i = 0; i < len; i++)
{
rs232_err= rs232_tx_send(DATA_BUFF[i], 1); // Send one byte
if (rs232_err!= OK)
{
err = STATUS_ERR;
break;
}
}
// Break do-while loop if there is an error
if (err == STATUS_ERR)
{
break;
}
} while (conditions);
return err;
}
My problem is that I do not know how to approach the problem of handling ascii text based protocol,
the above principle would work very well for byte based protocols.
Also, I do not know how to implement semicolon ";" seperation of data in the above snippet, as everything is sent byte by byte, I would need aditional logic to know when it is needed to send ";" and with current implementation, that would not look very good.
For fields id1 and id2, I am receiveing string values as a part of handle->config, they can be of any lenght, but max is 20. Because of that, with current implementation, I would be sending more than needed in case actual lenght is less than 20, but I cannot use pointers to char inside the struct, because in that case, only the pointer value would get sent.
So to sumarize, the main question is:
How to implement the above described text based protocol for rs-232 in a nice and proper way?
what is the most elegant way of implementing this (ASCII Text Based protocol) in C is my question?
Since this is ASCII, avoid endian issues of trying to map a multi-byte integer. Simply send an integer (including char) as decimal text. Likewise for floating point, use exponential notation and sufficient precision. E.g. sprintf(buf, "%.*e", DBL_DECIMAL_DIG-1, some_double);. Allow "%a" notation.
Do not use the same code for SOH and EOT. Different values reduce receiver confusion.
Send date and time using ISO 8601 as your guide. E.g. "2022-11-10", "23:38:42".
Send string with a leading/trailing ". Escape non-printable ASCII characters, and ", \, ;. Example for 10 long string 123\\;\"\xFF456 --> "123\\\;\"\xFF456".
Error check, like crazy, the received data. Reject packets of data for all sorts of reasons: field count wrong, string too long, value outside field range, bad CRC, timeout, any non-ASCII character received.
Use ASCII hex characters for CRC: 4 hex characters instead of 2 bytes.
Consider a CRC 32 or 64.
Any out-of-band input, (bytes before receiving a SOF) are silently dropped. This nicely allows an optional LF after each command.
Consider the only characters between SOH/EOT should be printable ASCII: 32-126. Escape others as needed.
Since "it's an ASCII based text protocol with a couple of frame types.", I'd expect a type field.
See What type of framing to use in serial communication for more ideas.
First of all, structs are really not good for representing data protocols. The struct in your example will be filled to the brim with padding bytes everywhere, so it is not a proper nor portable representation of the protocol. In particular, forget all about casting a struct to/from a raw uint8_t array - that's problematic for even more reasons: the first address alignment and pointer aliasing.
In case you insist on using a struct, you must write serialization/deserialization routines that manually copy to/from each member into the raw uint8_t buffer, which is the one that must be used for the actual transmission.
(De)serialization routines might not be such a bad idea anyway, because of another issue not addressed by your post: network endianess. RS-232 protocols are by tradition almost always Big Endian, but don't count on it - endianess must be documented explicitly.
My problem is that I do not know how to approach the problem of handling ascii text based protocol, the above principle would work very well for byte based protocols.
That is a minor problem compared to the above. Often it is acceptable to have a mix of raw data (essentially everything but the data payload) and ASCII text. If you want a pure ASCII protocol you could consider something like "AT commands", but they don't have much in the way of error handling. You really should have a CRC16 as well as sync bytes. Hint: preferably pick the first sync byte as something that don't match 7 bit ASCII. That is something with MSB set. 0xAA is popular.
Once you've sorted out data serialization, endianess and protocol structure, you can start to worry about details such as string handling in the payload part.
And finally, RS232 is dinosaur stuff. There's not many reasons why one shouldn't use RS422/RS485. The last argument for using RS232, "computers come with RS232 COM ports", went obsolete some 15-20 years back.
One thing your struct implementation is missing is packing. For efficiency reasons, depending on which processor your code is running on, the compiler will add padding to the structure to align on certain byte boundaries. Normally this doesn't effect you code that much, but if you are sending this data across a serial stream where every byte matters, then you will be sending random zeros across as well.
This article explains padding well, and how to pack your structures for use cases like yours
Structure Padding
I don't understand why I'm able to read() every character I type on my terminal but if I try to assign a non ascii value to a C variable it doesn't work.
There are three main questions below this code ->
int main (){
int fd;
fd = open("./dog.txt",O_RDONLY);
//contents of dog.txt -> 漢è hello
ssize_t r;
char b;
while( (r = read( fd, &b, sizeof(b))) > 0 ){
write(STDOUT_FILENO,&b, sizeof(b));
}
printf("\n");
//OUTPUT : 漢è hello
}
However something like this is not accepted :
int main (){
unsigned int test = '漢';
write(STDOUT_FILENO,&test,sizeof(test));
printf("\n");
}
The c program receives a series of bytes one at a time and then it sends them back one at a time to the terminal through the write system call ( the buffer in the example is 1 Byte ).
But how does the terminal know that it must "interpret" the chinese character as a group of 3 Bytes when I write()? Considering that I'm writing 1 Byte at a time it could have well interpreted each single Byte as three different 8 Bytes characters.
Is there some sort of cooperation between the process and the terminal to make this possible?
Could someone provide a straight to the point explanation of character encodings in both terminal and programs (in this case C)?
But how does the terminal know that it must "interpret" the chinese character as a group of 3 Bytes when I write() ? ...
The terminal sees a stream of bytes and tries to decode that stream into characters irrespectively of whether they were written with one write call or multiple calls.
The exact way it decodes the stream depends on the encoding used in your system. I assume that your system uses UTF-8, because that's an encoding where 漢 is represented with the sequence of the three bytes e6 bc a2 (here in hexadecimal). In UTF-8, the number of bytes the character takes is determined by its first byte. UTF-8 is actually ingenious for that and a few other reasons. For details you should refer the Wikipedia page on UTF-8.
Is there some sort of cooperation between the process and the terminal to make this possible ?
The process and the terminal both follow the system convention about which encoding to use. On UNIX systems that's determined by the value of the LANG, LC_ALL (or some other) environment variables. This might be seen as 'cooperation', but there's definitively no two-way communication between them other than the respective byte streams.
However something like this is not accepted : ...
It actually may work on some implementations. However the exact meaning of character literals (single-quoted strings) with multi-byte characters or multiple characters is not defined per the standard.
What is going to work on most UNIX systems though, is using a string literal and saving the source file in UTF-8:
char test[] = "漢";
write(STDOUT_FILENO, test, strlen(test));
printf("\n");
I have a file encoded in UTF-8, as it is shown by the following command :
file -i D.txt D.txt: text/plain; charset=utf-8
I just want to display each character one after one, so I have done this :
FILE * F_entree = fopen("D.txt", "r");
if (! F_entree) usage("impossible d'ouvrir le fichier d'entrée");
char ligne[TAILLE_MAX];
while (fgets(ligne, TAILLE_MAX, F_entree))
{
string mot = strtok(strdup(ligne), "\t");
while (*mot++){printf("%c \n", *mot) ;}
}
But the special characters aren't well displayed (a <?> is displayed instead) in the terminal (on Ubuntu 12). I think the problem is that only ASCII code can be stocked in %c, but how can I display those special characters?
And what's the good way to keep those characters in memory (in order to implement a tree index)? (I'm aware that this last question is unclear, don't hesitate to ask for clarifications.)
It does not work because your code splits up the multi-byte characters into separate ones. As your console expects a valid multi-byte code, after seeing a first one, and it does not receive the correct codes, you get your <?> -- translated freely, "whuh?". It does not receive a correct code because you are stuffing a space and newline in there.
Your console can only correctly interpret UTF8 characters if you send the right codes and in the correct sequence. The algorithm is:
Is the next character the start code for a UTF-8 sequence? If not, print it and continue.
If it is, print it and print all "next" codes for this character. See Wikipedia on UTF8 for the actual encoding; I took a shortcut in my code below.
Only then print your space (..?) and newline.
The procedure to recognize the start and length of a UTF8 multibyte character is this:
"Regular" (ASCII) characters never have their 7th bit set. Testing against 0x80 is enough to differentiate them from UTF8.
Each UTF8 character sequence starts with one of the bit patterns 110xxxxx, 1110xxxx, 11110xxx, 111110xx, or 1111110x. Every unique bit pattern has an associated number of extra bytes. The first one, for example, expects one additional byte. The xxx bits are combined with bits from the next byte(s) to form a 16-bit or longer Unicode character. (After all, that is what UTF8 is all about.)
Each next byte -- no matter how many! -- has the bit pattern 10xxxxxx. Important: none of the previous patterns start with this code!
Therefore, as soon as you see any UTF8 character, you can immediately display it and all 'next' codes, as long as they start with the bit pattern 10....... This can be tested efficiently with a bit-mask: value & 0xc0, and the result should be 0x80. Any other value means it's not a 'next' byte anymore, so you're done then.
All of this only works if your source file is valid UTF8. If you get to see some strange output, it most likely is not. If you need to check the input file for validity, you do need to implement the entire table in the Wikipedia page, and check if each 110xxxxx byte is in fact followed by a single 10xxxxxx byte, and so on. The pattern 10xxxxxx appearing on itself would indicate an error.
A definitive must-read is Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). See also UTF-8 and Unicode FAQ for Unix/Linux for more background information.
My code below addresses a few other issues with yours. I've used English variable names (see Meta Stackoverflow "Foreign variable names etc. in code"). It appears to me strdup is not necessary. Also, string is a C++ expression.
My code does not "fix" or handle anything beyond the UTF-8 printing. Because of your use of strtok, the code only prints the text before the first \t Tab character on each line in your input file. I assume you know what you are doing there ;-)
Add.: Ah, forgot to address Q2, "what's the good way to keep those characters in memory". UTF8 is designed to be maximally compatible with C-type char strings. You can safely store them as such. You don't need to do anything special to print them on an UTF8-aware console -- well, except when you are doing stuff as you do here, printing them as separate characters. printf ought to work just fine for whole words.
If you need UTF8-aware equivalents of strcmp, strchr, and strlen, you can roll your own code (see the Wikipedia link above) or find yourself a good pre-made library. (I left out strcpy intentionally!)
#define MAX_LINE_LENGTH 1024
int main (void)
{
char line[MAX_LINE_LENGTH], *word;
FILE *entry_file = fopen("D.txt", "r");
if (!entry_file)
{
printf ("not possible to open entry_file\n");
return -1;
}
while (fgets(line, MAX_LINE_LENGTH, entry_file))
{
word = strtok(line, "\t");
while (*word)
{
/* print UTF8 encoded characters as a single entity */
if (*word & 0x80)
{
do
{
printf("%c", *word);
word++;
} while ((*word & 0xc0) == 0x80);
printf ("\n");
} else
{
/* print low ASCII characters as-is */
printf("%c \n", *word);
word++;
}
}
}
return 0;
}
I'm working with Arduino.
I want to send Ctrl+z after a string in C. I tried truncating ^Z but that didn't work. So how to do that ?
Ctrl+Z = 26 = '\032' = '\x1A'. Either of the backslash escape sequences can be written in a string literal (but be careful with the hex escape as if it is followed by a digit or A-F or a-f, that will also be counted as part of the hex escape, which is not what you want).
However, if you are simulating terminal input on a Windows machine (so you want the character to be treated as an EOF indication), you need to think again. That isn't how it works.
It may or may not do what you want with Arduino, either; in part, it depends on what you think it is going to do. It also depends on whether the input string will be treated as if it came from a terminal.
I hacked this up as I needed similar
#include <stdio.h>
#define CTRL(x) (#x[0]-'a'+1)
int main (void)
{
printf("hello");
printf("%c", CTRL(n));
printf("%c", CTRL(z));
}
hope it helps 8)
I assume by "truncating" you actually meant appending.
In ASCII, CTRL+z is code point 26 so you can simply append that as a character, something like:
#define CTRL_Z 26
char buffer[100];
sprintf (buffer, "This is my message%c", CTRL_Z);
The sprintf method is only one of the ways of doing this but they all basically depend on you putting a single byte at the end with the value 26.
The following should work:
whatever you are trying to write append \032 at the end
For example:
strcpy(InputCommand,"hi\032");
GetSerialData(InputCommand,......); //this is my own function which uses serialPuts()