Escape characters (0x1b/27) in binary packages don't get sent through Wi-Fi and corrupt message during transmission - c

I am developing on an embedded system (STM32F4) and I tried to send some data to a simple Windows Forms client program on the PC side. When I used a character based string format everything was working fine but when I changed to a binary package to increase performance I run into an problem with Escape characters.
I'm using nanopb to implement Googles Protocol Buffer for transmission and I observed that in 5% of package I'm receiving exceptions in my client program telling me that my packages are corrupted.
I debugged in WireShark and saw that in this corrupted package the size was 2-4 bytes smaller than the original package size. Upon further inspecting I found out that the corrupted packages always included the binary value 27 and other packages never included this value. I searched for it and saw that this value represents an escape character and that this might lead to problems.
The technical document of the Wi-Fi module I'm using (Gainspan GSM2100) mentions that commands are preceded by an escape character so I think I need to get rid of this values in my package.
I couldn't find a solution to my problem so I would appreciate if somebody more experienced could led me to the right approach to solve this problem.

How are you sending the data? Are you using a library or sending raw bytes? According to the manual, your data commands should start with an escape sequence, but also have data length specified:
// Each escape sequence starts with the ASCII character 27 (0x1B),
// the equivalent to the ESC key. The contents of < > are a byte or byte stream.
// - Cid is connection id (udp, tcp, etc)
// - Data Length is 4 ASCII char represents decimal value
// i.e. 1400 bytes would be '1' '4' '0' '0' (0x31 0x34 0x30 0x30).
// - Data size must match with specified length.
// Ignore all command or esc sequence in between data pay load.
<Esc>Z<Cid><Data Length xxxx 4 ascii char><data>
Note the remark regarding data size: "Ignore all command or esc sequence in between data pay load".
For example, this is how the GSCore::writeData function in GSCore.cpp looks like:
// Including a trailing 0 that snprintf insists to write
uint8_t header[8];
// Prepare header: <esc> Z <cid> <ascii length>
snprintf((char*)header, sizeof(header), "\x1bZ%x%04d", cid, len);
// First, write the escape sequence up to the cid. After this, the
// module responds with <ESC>O or <ESC>F.
writeRaw(header, 3);
if (!readDataResponse()) {
if (GS_LOG_ERRORS && this->error)
this->error->println("Sending bulk data frame failed");
return false;
}
// Then, write the rest of the escape sequence (-1 to not write the
// trailing 0)
writeRaw(header + 3, sizeof(header) - 1 - 3);+
// And write the actual data
writeRaw(buf, len);
This should most likely work. Alternatively, a dirty hack might be to "escape the escape character" before sending, i.e. replace each 0x27 with two characters (0x27 0x27) before sending - but this is just a wild guess and I am presuming you should just check the manual.

Related

Contiguous Hex file generation using GCC

I have a Hex file for STM32F427 that was built using GCC(gcc-arm-none-eabi) version 4.6 that had contiguous memory addresses. I wrote boot loader for loading that hex file and also added checksum capability to make sure Hex file is correct before starting the application.
Snippet of Hex file:
:1005C80018460AF02FFE07F5A64202F1D00207F5F9
:1005D8008E4303F1A803104640F6C821C2F2000179
:1005E8001A460BF053F907F5A64303F1D003184652
:1005F8000BF068F907F5A64303F1E80340F6FC1091
:10060800C2F2000019463BF087FF07F5A64303F145
:10061800E80318464FF47A710EF092FC07F5A643EA
:1006280003F1E80318460EF03DFC034607F5A64221
:1006380002F1E0021046194601F0F2FC07F56A5390
As you can see all the addresses are sequential. Then we changed the compiler to version 4.8 and i got the same type of Hex file.
But now we used compiler version 6.2 and the Hex file generated is not contiguous. It is somewhat like this:
:10016000B9BC0C08B9BC0C08B9BC0C08B9BC0C086B
:10017000B9BC0C08B9BC0C08B9BC0C08B9BC0C085B
:08018000B9BC0C08B9BC0C0865
:1001900081F0004102E000BF83F0004330B54FEA38
:1001A00041044FEA430594EA050F08BF90EA020FA5
As you can see after 0188 it is starting at 0190 means rest of 8 bytes(0189 to 018F) are 0xFF as they are not flashed.
Now boot loader is kind of dumb where we just pass the starting address and no of bytes to calculate the checksum.
Is there a way to make hex file in contiguous way as compiler 4.6 and compiler 4.8? the code is same in all the three times.
If post-processing the hex file is an option, you can consider using the IntelHex python library. This lets you manipulate hex file data (i.e. ignoring the 'markup'; record type, address, checksum etc) rather than as lines, will for instance create output with the correct line checksum.
A fast way to get this up and running could be to use the bundled convenience scripts hex2bin.py and bin2hex.py:
python hex2bin.py --pad=FF noncontiguous.hex tmp.bin
python bin2hex.py tmp.bin contiguous.hex
The first line converts the input file noncontiguous.hex to a binary file, padding it with FF where there is no data. The second line converts it the binary file back to a hex file.
The result would be
:08018000B9BC0C08B9BC0C0865
becomes
:10018000B9BC0C08B9BC0C08FFFFFFFFFFFFFFFF65
As you can see, padding bytes are added where the input doesn't have any data, equivalent to writing the input file to the device and reading it back out. Bytes that are in the input file are kept the same - and at the same address.
The checksum is also correct as changing the length byte from 0x08 to 0x10 compensates for the extra 0xFF bytes. If you padded with something else, IntelHex would output the correct checksum
You can skip the the creation of a temporary file by piping these: omit tmp.bin in the first line and replacing it with - in the second line:
python hex2bin.py --pad=FF noncontiguous.hex | python bin2hex.py - contiguous.hex
An alternative way could be to have a base file with all FF and use the hexmerge.py convenience script to merge gcc's output onto it with --overlap=replace
The longer, more flexible way, would be to implement your own tool using the IntelHex API. I've used this to good effect in situations similar to yours - tweak hex files to satisfy tools that are costly to change, but only handle hex files the way they were when the tool was written.
One of many possible ways:
Make your hex file with v6.2, e.g., foo.hex.
Postprocess it with this Perl oneliner:
perl -pe 'if(m/^:(..)(.*)$/) { my $rest=16-hex($1); $_ = ":10" . $2 . ("FF" x $rest) . "\n"; }' foo.hex > foo2.hex
Now foo2.hex will have all 16-byte lines
Note: all this does is FF-pad to 0x10 bytes. It doesn't check addresses or anything else.
Explanation
perl -pe '<some script>' <input file> runs <some script> for each line of <input file>, and prints the result. The script is:
if(m/^:(..)(.*)$/) { # grab the existing byte count into $1
my $rest=16 - hex($1); # how many bytes of 0xFF we need
$_ = ":10" . $2 . ("FF" x $rest) . "\n"; # make the new 16-byte line
# existing bytes-^^ ^^^^^^^^^^^^^^-pad bytes
}
Another solution is to change the linker script to ensure the preceding .isr_vector section ends on a 16 byte alignment, as the mapfile reveals that the following .text section is 16 byte aligned.
This will ensure there is no unprogrammed flash bytes between the two sections
You can use bincopy to fill all empty space with 0xff.
$ pip install bincopy
$ bincopy fill foo.hex
Use the -gap-fill option of objcopy, e.g.:
arm-none-eabi-objcopy --gap-fill 0xFF -O ihex firmware.elf firmware.hex

TStringList behavior with non ANSI files

In my application, when I want import a file, i use TStringList.
But, when someone export data from Excel, the file encoding is UCS-2 Little Endian, and TStringList can't read the data.
There is any way to validate this situation, identify the text encoding and send a warning to the user that the text provided is not compatible?
Just to be clear, the user will provide only plain text..letter and numbers, otherwise this, I must send the warning.
Unicode File without BOM is good. (TStringList can read it!)
ANSI file Too. (TStringList can read it!)
Even Unicode with BOM will be good, if there is a way to remove it. (TStringList can read it!, but with "i" ">>" and "reverse ?" characters, that belongs to BOM bytes)
I used the following function in Delphi 6 to detect Unicode BOMs.
const
//standard byte order marks (BOMs)
UTF8BOM: array [0..2] of AnsiChar = #$EF#$BB#$BF;
UTF16LittleEndianBOM: array [0..1] of AnsiChar = #$FF#$FE;
UTF16BigEndianBOM: array [0..1] of AnsiChar = #$FE#$FF;
UTF32LittleEndianBOM: array [0..3] of AnsiChar = #$FF#$FE#$00#$00;
UTF32BigEndianBOM: array [0..3] of AnsiChar = #$00#$00#$FE#$FF;
function FileHasUnicodeBOM(const FileName: string): Boolean;
var
Buffer: array [0..3] of AnsiChar;
Stream: TFileStream;
begin
Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite); // Allow other programs read access at the same time.
Try
FillChar(Buffer, SizeOf(Buffer), $AA);//fill with characters that we are not expecting then...
Stream.Read(Buffer, SizeOf(Buffer)); //...read up to SizeOf(Buffer) bytes - there may not be enough
//use Read rather than ReadBuffer so the no exception is raised if we can't fill Buffer
Finally
FreeAndNil(Stream);
End;
Result := CompareMem(#UTF8BOM, #Buffer, SizeOf(UTF8BOM)) or
CompareMem(#UTF16LittleEndianBOM, #Buffer, SizeOf(UTF16LittleEndianBOM)) or
CompareMem(#UTF16BigEndianBOM, #Buffer, SizeOf(UTF16BigEndianBOM)) or
CompareMem(#UTF32LittleEndianBOM, #Buffer, SizeOf(UTF32LittleEndianBOM)) or
CompareMem(#UTF32BigEndianBOM, #Buffer, SizeOf(UTF32BigEndianBOM));
end;
This will detect all the standard BOMs. You could use it to block such files if that's the behaviour you want.
You state that Delphi 6 TStringList can load 16 bit encoded files if they do not have a BOM. Whilst that may be the case, you will find that, for characters in the ASCII range, every other character is #0. Which I guess is not what you want.
If you want to detect that text is Unicode for files without BOMs then you could use IsTextUnicode. However, it may give false positives. This is a situation where I suspect it is better to ask for forgiveness than permission.
Now, if I were you I would not actually try to block Unicode files. I would read them. Use the TNT Unicode library. The class you want is called TWideStringList.

Python 3: reading UCS-2 (BE) file

I can't seem to be able to decode UCS-2 BE files (legacy stuff) under Python 3.3, using the built-in open() function (stack trace shows UnicodeDecodeError and contains my readLine() method) - in fact, I wasn't able to find a flag for specifying this encoding.
Using Windows 8, terminal is set to codepage 65001, using 'Lucida Console' fonts.
Code snippet won't be of too much help, I guess:
def display_resource():
f = open(r'D:\workspace\resources\JP.res', encoding=<??tried_several??>)
while True:
line = f.readline()
if len(line) == 0:
break
Appreciating any insight into this issue.
UCS-2 is UTF-16, really, for any codepoint that was assigned when it was still called UCS-2 in any case.
Open it with encoding='utf16'. If there is no BOM (the Byte order mark, 2 bytes at the start, for BE that'd be \xfe\xff), then use encoding='utf_16_be' to force a byte order.

How to get Ctrl, Shift or Alt with getch() ncurses?

How to get Ctrl, Shift or Alt with getch() ncurses ?
I cannot get it work to get Ctrl, Shift or Alt with getch() using ncurses ? Do I miss something in the man ?
Amazing how sometimes the right answer gets demoted, and answers that "authoritatively" give up get promoted... With a bit of creativity, key_name actually holds the right key to figuring this out, with one caveat - that SHIFT/ALT/CTRL are pressed with other keys at the same time:
First, for "normal keys" such as the printable ones, you can easily detect shift because it uppercases.
For special keys, e.g. KEY_LEFT, you will see that the code generated when SHIFT is selected is actually KEY_SLEFT. ditto for KEY_RIGHT. Unfortunately, no such luck for KEY_UP/KEY_DOWN , which seem unfazed by SHIFT. So you can distinguish by the returned char from getch() - the KEY_S.. implies shift was pressed.
For ALT (what's not trapped by X or the Aqua Windowmanager, at least), keyname will convert the key to an M... something.
For CTRL you'll get a "^" preceding the actual key name. E.g ^R for key 18
So you can now figure out the key codes for your switch(getch) statements, etc, by a simple snippet:
ch = getch(); endwin(); printf("KEY NAME : %s - %d\n", keyname(ch),ch);
and that's that. Think before definitively saying "can't". Maybe there's a way that's less obvious.
At least for the control modifier there is a simple solution. Curses had been derived from vi source code, in which you find the following (see
https://github.com/n-t-roff/ex-1.1/blob/master/ex.h line 77 and https://github.com/n-t-roff/ex-1.1/blob/master/ex_vops.c line 445):
#ifndef CTRL
#define CTRL(c) ((c) & 037)
#endif
switch(getch()) {
case CTRL('r'):
/* key ctrl-r (i.e. ^R) pressed */
Dependend on used includes CTRL may or may not already been defined in your code.
(To roughly copy my answer from How to get Shift+X / Alt+X keys in Curses ?)
Long story short - you cannot. The modifier keys are just that - modifiers. They do not exist in their own right, they modify some other (printing) key that you might press.
That said, if you are feeling especially brave, you can try my libtermkey which will at least correctly parse things like Ctrl-arrow.
Finally if you're feeling even braver you can run the terminal I wrote, pangoterm, which has generic ways to encode any arbitrarily modified Unicode keys, so it can distinguish Ctrl-m from Enter, Ctrl-Shift-a from Ctrl-a, etc...
However, outside of these, the answer remains "you cannot".
Agreeing (partly) with #leonerd, ncurses will only give you those keys as they are used as modifiers to other keys (ignoring the ASCII escape character which some people confuse with the Alt key). Some specific devices can be told to give this information (e.g., Linux console as documented in console_ioctl(4)), but that's not a problem that ncurses will solve for you.
Refer to the ncurses FAQ How can I use shift- or control-modifiers? for a long answer.
But short: ncurses doesn't tell you if a given modifier was used (except for special cases where there were well-known uses of shift), but rather its terminal descriptions provide the information either by
multiplying the actual function keys by combinations of shift- and control-modifiers, or by
using names based on xterm's PC-style function keys (shift is 2, alt is 3, control is 5, etc), to provide the information.
There are two approaches because the first uses an array of no more than 60 function keys (good enough for shift- and control-combinations), while the other just uses user-defined names).
All of these modified keys give multiple bytes; an application using keypad() (of course) in ncurses would get a single number. In the latter case, the keycodes are determined at runtime.
That applies mainly to the special keys (function-, editing- and cursor-keys). For regular keys, one might assume that keyname gives some special behavior, but reading the description it does not:
it reports the ASCII control characters (which you can do using the iscntrl macro), and
makes assumptions about meta (which only are useful for xterm, of the terminals you are likely to use), and
offers no help for the shift modifier.
Of terminals... all have the modifier information available internally, but terminals generally do not have a way to pass this information to applications. xterm can do this using the modifyOtherKeys resource,
modifyOtherKeys (class ModifyOtherKeys)
Like modifyCursorKeys, tells xterm to construct an escape
sequence for other keys (such as "2") when modified by
Control-, Alt- or Meta-modifiers. This feature does not apply
to function keys and well-defined keys such as ESC or the
control keys. The default is "0":
0 disables this feature.
1 enables this feature for keys except for those with well-
known behavior, e.g., Tab, Backarrow and some special
control character cases, e.g., Control-Space to make a
NUL.
2 enables this feature for keys including the exceptions
listed.
which corresponds to a control sequence, seen in XTerm Control Sequences:
CSI > Ps; Ps m
Set or reset resource-values used by xterm to decide whether
to construct escape sequences holding information about the
modifiers pressed with a given key. The first parameter iden-
tifies the resource to set/reset. The second parameter is the
value to assign to the resource. If the second parameter is
omitted, the resource is reset to its initial value.
Ps = 0 -> modifyKeyboard.
Ps = 1 -> modifyCursorKeys.
Ps = 2 -> modifyFunctionKeys.
Ps = 4 -> modifyOtherKeys.
but (being an xterm-specific feature), there's no reason to use it in ncurses: it would needlessly complicate getch.
I use a double getch() trick.
#define ESC 27
int altpressed;
int sgetch()
{
int t;
t=getch();
if ( t == ESC ) { // possible alt held down
t=bgetch(10);
if ( t == -1 ) // escape key pressed
return ESC;
altpressed=1;
}
return t;
}
Now bgetch() :
// getch with block/unblock
int bgetch(int delay)
{
int t;
if ( delay != BLOCK )
blockunblockgetch(delay);
t=getch();
blockunblockgetch();
return t;
}
..and blockunblockgetch() :
// block, unblock getch
int blockunblockgetch(int delay)
{
wtimeout(win1, delay);
return ( delay == -1 ) ? 0 : 1;
}
What you have is simply a getch() that if ESC is detected, it blocks the next getch() for 10ms. If there is another char in the stream, it will return that char (in int) with altpressed (a global variable) set to 1. You can then utilize the alt+key checking altpressed. SHIFT and CTRL are easier to detect, they are recognized as single ints with NCURSES.
You can call key_name( c ) to turn the key generated from getch() into something that shows you the state of the ctrl-modifier.
For example this code shows "^R" if you press ctrl-r:
while( true )
{
char c = getch();
if ( ERR == c )
break;
const char *name = key_name( c );
move( 2, 2 );
clear();
printw( "You entered: %s ", name );
refresh();
}
I know this is old, but for anyone still looking...
You CAN do this on Windows (At least on my version).
Simply interpret getch as bytes-like integer then as a string.
str(getchvariablehere)
For example, Ctrl+A is 1 on my keyboard.
Its a bit tedious, but you can check any combo with this method.
My wording may be off, but this method seems to work with raster fonts that can't display bytes as arbitrary unicode.

UnicodeEncodeError Google App Engine

I am getting the very familiar:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 24: ordinal not in range(128)
I have checked out multiple posts on SO and they recommend - variable.encode('ascii', 'ignore')
however, this is not working. Even after this I am getting the same error ...
The stack trace:
'ascii' codec can't encode character u'\x92' in position 18: ordinal not in range(128)
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 513, in __call__
handler.post(*groups)
File "/base/data/home/apps/autominer1/1.343038273644030157/siteinfo.py", line 2160, in post
imageAltTags.append(str(image["alt"]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in position 18: ordinal not in range(128)
The code responsible for the same:
siteUrl = urlfetch.fetch("http://www."+domainName, headers = { 'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b5) Gecko/2008032620 Firefox/3.0b5' } )
webPage = siteUrl.content.decode('utf-8', 'replace').encode('ascii', 'replace')
htmlDom = BeautifulSoup(webPage)
imageTags = htmlDom.findAll('img', { 'alt' : True } )
for image in imageTags :
if len(image["alt"]) > 3 :
imageAltTags.append(str(image["alt"]))
Any help would be greatly appreciated. thanks.
There are two different things that Python treats as strings - 'raw' strings and 'unicode' strings. Only the latter actually represent text. If you have a raw string, and you want to treat it as text, you first need to convert it to a unicode string. To do this, you need to know the encoding for the string - they way unicode codepoints are represented as bytes in the raw string - and call .decode(encoding) on the raw string.
When you call str() on a unicode string, the opposite transformation takes place - Python encodes the unicode string as bytes. If you don't specify a character set, it defaults to ascii, which is only capable of representing the first 128 codepoints.
Instead, you should do one of two things:
Represent 'imageAltTags' as a list of unicode strings, and thus dump the str() call - this is probably the best approach
Instead of str(x), call x.encode(encoding). The encoding to use will depend on what you're doing, but the most likely choice is utf-8 - eg, x.encode('utf-8').

Resources