JNI function returning illegal UTF characters at android - c

Im trying to return string from JNI to android but its returning illegal UTF characters like this:
JNI DETECTED ERROR IN APPLICATION: input is not valid Modified UTF-8:
illegal start byte 0x80
04-12 16:08:09.899 18210-18372 A/art:art/runtime/runtime.cc:427]
string: '���� ���!��"��,"���"���#���$��%���
%��`&��'��H(���)��D*���*��X+��,���,���-��4.��|.��P/��t/���/��01��x1��
2��D2���2���3���4���5��06���6��9���9��;���;��H<��=��0=���=���>��8?��
Here is the code which I am using:
JNIEXPORT jbyteArray Java_pakdata_com_qurantextc_MainActivity_get(
JNIEnv *pEnv,
jobject this,
jint pageNo, jint lang) {
char* buffer=(char*)malloc(10000); // this buffer contains the ayat
register unsigned int pageNumber = pageNo - 1;
char * header=(char*)malloc(1000);
sprintf(header,"[{\"OFFSET\":%d,\"DATA\":\"",pageNumber+1);
strcpy(buffer,header);
// to get the last ayat of the page
// this loop will fetch all ayats of the page
for (int i = start_ayat; i <= end_ayat; i++) {
sprintf(buffer+strlen(buffer),"<div class=\\\"qr0\\\" data-ayat=\\\"%d\\\" id=\"%d\\\"><span>",i+1,i+1);
get(lang, i, buffer + strlen(buffer)); // len is equal to length of buffer ( strlen() )
strcpy(buffer+strlen(buffer),"<\\/span><\\/div>");
}
// char* footer;
sprintf(buffer+strlen(buffer),"<div class=\\\"pagebreak\">%d<a id=\\\"%d\\\"next\\\"href=\\\"\\/page\\/%d\\\"></a><\\/div <\\/div>\"}]",pageNumber+1,pageNumber+1,pageNumber+1);
__android_log_print(ANDROID_LOG_DEBUG, "LOG_TAG","string: '%s'" , buffer);
int l = strlen(buffer);
char c[l];
strcpy(c,replace(buffer,"\r","<br>"));
jbyteArray ret = (*pEnv)->NewByteArray(pEnv,l);
(*pEnv)->SetByteArrayRegion (pEnv,ret, 0, l, c);
const char * errorKind = NULL;
uint8_t utf8 = checkUtfytes(c, &errorKind);
if (errorKind != NULL) {
free(buffer);
return ret;
} else {
free(buffer);
return ret;
}
I have tried using this too:
return = (*pEnv)->NewStringUTF(pEnv,buffer)
but it still contain illegal UTF characters..
Here is my android side code
byte[] ss = get(a, pos);
s= new String(ss,"UTF-8");
Still getting illegal UTF character error.
I have tried encoding on java side but its no help either,
I am posting here because all other methods that are written here i have already tried but it didn't worked.
PLEASE HELP!!!

May be I am late but your code seems to be correct but according to JNI documentation they doesn't support these characters. You have to handle it from server side. Hope it helps.

Related

How mbtowc uses locale?

I have hard time using mbtowc, which keeps returning wrong results. It also puzzles me why the function even uses locale? Multibyte unicode chars points are locale independent. I implemented custom conversion function that convert it well, see the code below.
I use GCC 4.8.1 on Windows (where sizeof wchar_t is 2), using Czech locale (cs_CZ). The OEM codepage is windows-1250, console by default uses CP852. These are my results so far:
#include <stdio.h>
#include <stdlib.h>
// my custom conversion function
int u8toint(const char* str) {
if(!(*str&128)) return *str;
unsigned char c = *str, bytes = 0;
while((c<<=1)&128) ++bytes;
int result = 0;
for(int i=bytes; i>0; --i) result|= (*(str+i)&127)<<(6*(bytes-i));
int mask = 1;
for(int i=bytes; i<6; ++i) mask<<= 1, mask|= 1;
result|= (*str&mask)<<(6*bytes);
return result;
}
// data inspecting type for the tests in main()
union data {
wchar_t w;
struct {
unsigned char b1, b2;
} bytes;
} a,b,c;
int main() {
// I tried setlocale here
mbtowc(NULL, 0, 0); // reset internal mb_state
mbtowc(&(a.w),"ř",6); // apply mbtowc
b.w = u8toint("ř"); // apply custom function
c.w = L'ř'; // compare to wchar
printf("\na = %hhx%hhx", a.bytes.b2, a.bytes.b1); // a = 0c5 wrong
printf("\nb = %hhx%hhx", b.bytes.b2, b.bytes.b1); // b = 159 right
printf("\nc = %hhx%hhx", c.bytes.b2, c.bytes.b1); // c = 159 right
getchar();
}
Here are setlocale settings and the results for a:
setlocale(LC_CTYPE,"Czech_Czech Republic.1250"); // a = 139 wrong
setlocale(LC_CTYPE,"Czech_Czech Republic.852"); // a = 253c wrong
Why mbtowc doesn't give 0x159 - the unicode number of ř?

InternetReadFile's output to string gives a "ÌÌÌÌÌÌ" sequence in C

I am trying to read the text content of a HTTP Post Response body using InternetReadFile. However, all it contains is a string of "Ì" (-52 when converted to int).
Could this be encoding related? Is it that what is being returned is not a string at all?
Am I missing a step required to read the output?
Please note that I know for a fact that this message body contains plain text (based on logs).
Here is the code:
Ptr = (char *)OutBuffer;
while(TRUE)
{
// read the server response
//
if(!InternetReadFile(RequestHandle,Ptr,Length ,&BytesRead))
{
Rc = GetLastError();
InternetCloseHandle(RequestHandle );
SetLastError(Rc);
return(ACE_HTTP_ERROR);
}
if(BytesRead == 0) // end of data
break;
TotalLength += BytesRead;
Ptr += BytesRead;
if(TotalLength >= *OutBufferLength)
{
InternetCloseHandle(RequestHandle );
SetLastError(ERROR_INSUFFICIENT_BUFFER);
*OutBufferLength = TotalLength;
return(ACE_HTTP_NO_ENOUGH_SPACE);
}
}
*OutBufferLength = TotalLength;
At this point, Ptr, when read as a char array, contains nothing but a sequence of 'Ì'.

The execution of the code always goes into the else statement

Some very strange things happen in my source code.
The following function works well and it prints 'y' when the password is correct and prints 'n' when it is incorrect. But if i add some UART1_Write and Delay functions to the else statement the bug comes out and even if the password is "zxc" (correct) it ALWAYS enters the else statement.
I'm using MikroC PRO for PIC v6.0.0, the robot system is made of PIC18F452 and RN-42 bluetooth module connected to it. I am testing with a laptop with a bluetooth and TeraTerm.
For more info: http://instagram.com/p/pLnU9eDL8z/#
Here it is the well working routine:
void authenticate() {
char *input = "";
char *password = "zxc\0";
unsigned char ready = 0;
while (connected && !ready) {
if (UART1_Data_Ready()) {
UART1_Read_Text(input, "|", 17);
strcat(input, "\0");
if (strcmp(input, password) == 0) {
UART1_Write('y');
ready = 1;
} else {
UART1_Write('n');
ready = 1;
}
}
}
}
This version of the routine ALWAYS goes in the ELSE statement of the strcmp(input, password) == 0 part:
void authenticate() {
char *input = "";
char *password = "zxc\0";
unsigned char ready = 0;
while (connected && !ready) {
if (UART1_Data_Ready()) {
UART1_Read_Text(input, "|", 17);
strcat(input, "\0");
if (strcmp(input, password) == 0) {
UART1_Write('y');
ready = 1;
} else {
UART1_Write('n');
Delay_ms(100);
UART1_Write('$');
Delay_ms(100);
UART1_Write('$');
Delay_ms(100);
UART1_Write('$');
Delay_ms(100);
UART1_Write('K');
Delay_ms(100);
UART1_Write(',');
Delay_ms(100);
UART1_Write('-');
Delay_ms(100);
UART1_Write('-');
Delay_ms(100);
UART1_Write('-');
Delay_ms(100);
UART1_Write('\n');
ready = 1;
}
}
}
}
It is important to send all these addition symbols in order to get RN-42 into command mode and disconnect the user if the password is wrong.
Please help me solve the problem. Any ideas appreciated!
As others have pointed out in the comments section a major issue with your code is that you are trying to store the UART data to memory that does not belong to you.
When you declare char *input = "";, you haven't actually allocated any space except for a single byte that stores '\0'. Then, when you use UART1_Read_Text(), you tell that function you may have up to 17 characters that will be read before finding the delimiter - all of which should be stored at the location pointed to by input.
The description of that library function can be found here. Also, based on the library description it looks like UART1_Read_Text() already adds the null-termination to the UART data. I base this assumption off the description of UARTx_Write_Text and the example that they provide on their website. However, I would recommend that you verify that is indeed the case.
Also, your initialization of password is redundant and char *password = "zxc\0" should be changed to char *password = "zxc". When you declare a string literal using double quotation marks it is automatically null-terminated. This excerpt is from "C in a Nutshell":
A string literal consists of a sequence of characters (and/or escape sequences) enclosed in double quotation marks... A string literal is a static array of char that contains character codes followed by a string terminator, the null character \0... The empty string "" occupies exactly one byte in memory, which holds the terminating null character.
Based on the above, I would go about it a little more like this:
#define MAX_NUM_UART_RX_CHARACTERS 17
void authenticate()
{
char input[MAX_NUM_UART_RX_CHARACTERS + 1];
char *password = "zxc";
unsigned char ready = 0;
while (connected && !ready)
{
if (UART1_Data_Ready())
{
UART1_Read_Text(input, "|", MAX_NUM_UART_RX_CHARACTERS);
if (strcmp(input, password) == 0)
{
UART1_Write('y');
ready = 1;
}
else
{
UART1_Write('n');
ready = 1;
}
}
}
}

Parsing code for GPS NMEA string

i am trying to parse the incoming GPGGA NMEA GPS string using Arduino uno and below code.
What i am trying to do is that i am using only GPGGA NMEA string to get the values of Latitude, longitude and altitude.In my below code, i had put certain checks to check if incoming string is GPGGA or not, and then store the further string in a array which can be further parsed suing strtok function and all the 3 GPS coordinates can be easily find out.
But i am unable to figure out how to store only GPGGA string and not the further string.I am using a for loop but it isn't working.
I am not trying to use any library.I had came across certain existing codes like this.
Here is the GPGGA string information link
i am trying to have following functionlity
i) Check if incoming string is GPGGA
ii) If yes, then store the following string upto EOL or upto * (followed by checksum for the array) in a array, array length is variable(i am unable to find out solution for this)
iii) Then parse the stored array(this is done, i tried this with a different array)
#include <SoftwareSerial.h>
SoftwareSerial mySerial(10,11); // 10 RX / 11 TX
void setup()
{
Serial.begin(9600);
mySerial.begin(9600);
}
void loop()
{
uint8_t x;
char gpsdata[65];
if((mySerial.available()))
{
char c = mySerial.read();
if(c == '$')
{char c1 = mySerial.read();
if(c1 == 'G')
{char c2 = mySerial.read();
if(c2 == 'P')
{char c3 = mySerial.read();
if(c3 == 'G')
{char c4 = mySerial.read();
if(c4 == 'G')
{char c5 = mySerial.read();
if(c5 == 'A')
{for(x=0;x<65;x++)
{
gpsdata[x]=mySerial.read();
while (gpsdata[x] == '\r' || gpsdata[x] == '\n')
{
break;
}
}
}
else{
Serial.println("Not a GPGGA string");
}
}
}
}
}
}
}
Serial.println(gpsdata);
}
Edit 1:
Considering Joachim Pileborg, editing the for loop in the code.
I am adding a pic to show the undefined output of the code.
Input for the code:
$GPGGA,092750.000,5321.6802,N,00630.3372,W,1,8,1.03,61.7,M,55.2,M,,*76
$GPGSA,A,3,10,07,05,02,29,04,08,13,,,,,1.72,1.03,1.38*0A
$GPGSV,3,1,11,10,63,137,17,07,61,098,15,05,59,290,20,08,54,157,30*70
$GPGSV,3,2,11,02,39,223,19,13,28,070,17,26,23,252,,04,14,186,14*79
$GPGSV,3,3,11,29,09,301,24,16,09,020,,36,,,*76
$GPRMC,092750.000,A,5321.6802,N,00630.3372,W,0.02,31.66,280511,,,A*43
$GPGGA,092751.000,5321.6802,N,00630.3371,W,1,8,1.03,61.7,M,55.3,M,,*75
$GPGSA,A,3,10,07,05,02,29,04,08,13,,,,,1.72,1.03,1.38*0A
$GPGSV,3,1,11,10,63,137,17,07,61,098,15,05,59,290,20,08,54,157,30*70
$GPGSV,3,2,11,02,39,223,16,13,28,070,17,26,23,252,,04,14,186,15*77
$GPGSV,3,3,11,29,09,301,24,16,09,020,,36,,,*76
$GPRMC,092751.000,A,5321.6802,N,00630.3371,W,0.06,31.66,280511,,,A*45
After a quick check of the linked article on the NMEA 0183 protocol, this jumped out at me:
<CR><LF> ends the message.
This means, that instead of just read indiscriminately from the serial port, you should be looking for that sequence. If found, you should terminate the string, and break out of the loop.
Also, you might want to zero-initialize the data string to begin with, to easily see if there actually is any data in it to print (using e.g. strlen).
You could use some functions from the C library libnmea. Theres functions to split a sentence into values by comma and then parse them.
Offering this as a suggestion in support of what you are doing...
Would it not be useful to replace all of the nested if()s in your loop with something like:
EDIT added global string to copy myString into once captured
char globalString[100];//declare a global sufficiently large to hold you results
void loop()
{
int chars = mySerial.available();
int i;
char *myString;
if (chars>0)
{
myString = calloc(chars+1, sizeof(char));
for(i=0;i<chars;i++)
{
myString[i] = mySerial.read();
//test for EOF
if((myString[i] == '\n') ||(myString[i] == '\r'))
{
//pick this...
myString[i]=0;//strip carriage - return line feed(or skip)
//OR pick this... (one or the other. i.e.,I do not know the requirements for your string)
if(i<chars)
{
myString[i+1] = mySerial.read() //get remaining '\r' or '\n'
myString[i+2]=0;//add null term if necessary
}
break;
}
}
if(strstr(myString, "GPGGA") == NULL)
{
Serial.println("Not a GPGGA string");
//EDIT
strcpy(globalString, "");//if failed, do not want globalString populated
}
else
{ //EDIT
strcpy(globalString, myString);
}
}
//free(myString) //somewhere when you are done with it
}
Now, the return value from mySerial.available() tells you exactly how many bytes to read, you can read the entire buffer, and test for validity all in one.
I have a project that will need to pull the same information out of the same sentence.
I got this out of a log file
import serial
import time
ser = serial.Serial(1)
ser.read(1)
read_val = ("nothing")
gpsfile="gpscord.dat"
l=0
megabuffer=''
def buffThis(s):
global megabuffer
megabuffer +=s
def buffLines():
global megabuffer
megalist=megabuffer.splitlines()
megabuffer=megalist.pop()
return megalist
def readcom():
ser.write("ati")
time.sleep(3)
read_val = ser.read(size=500)
lines=read_val.split('\n')
for l in lines:
if l.startswith("$GPGGA"):
if l[:len(l)-3].endswith("*"):
outfile=open('gps.dat','w')
outfile.write(l.rstrip())
outfile.close()
readcom()
while 1==1:
readcom()
answer=raw_input('not looping , CTRL+C to abort')
The result is this:
gps.dat
$GPGGA,225714.656,5021.0474,N,00412.4420,W,0,00,50.0,0.0,M,18.0,M,0.0,0000*5B
Using "malloc" every single time you read a string is an enormous amount of computational overhead. (And didn't see the corresponding free() function call. Without that, you never get that memory back until program termination or system runs out of memory.) Just pick the size of the longest string you will ever need, add 10 to it, and declare that your string array size. Set once and done.
There are several C functions for getting substrings out of a string, strtok() using the coma is probably the least overhead.
You are on an embedded microcontroller. Keep it small, keep overhead down. :)
#include <stdio.h>
#include <string.h>
#define GNSS_HEADER_LENGTH 5
#define GNSS_PACKET_START '$'
#define GNSS_TOKEN_SEPARATOR ','
#define bool int
#define FALSE 0
#define TRUE 1
//To trim a string contains \r\n
void str_trim(char *str){
while(*str){
if(*str == '\r' || *str == '\n'){
*str = '\0';
}
str++;
}
}
/**
* To parse GNSS data by header and the index separated by comma
*
* $GPGSV,1,1,03,23,39,328,30,18,39,008,27,15,33,035,33,1*5A
* $GNRMC,170412.000,V,,,,,,,240322,,,N,V*2D
* $GNGGA,170412.000,,,,,0,0,,,M,,M,,*57
*
* #data_ptr the pointer points to gps data
* #header the header for parsing GPGSV
* #repeat_index the header may repeat for many lines
* so the header index is for identifying repeated header
* #token_index is the index of the parsing data separated by ","
* the start is 1
* #result to store the result of the parser input
*
* #result bool - parsed successfully
**/
bool parse_gnss_token(char *data_ptr, char *header, int repeat_index, int token_index, char *result) {
bool gnss_parsed_result = FALSE; // To check GNSS data parsing is success
bool on_header = FALSE;
// For header
int header_repeat_counter = 0;
int header_char_index = 0; // each char in header index
// For counting comma
int counted_token_index = 0;
// To hold the result character index
bool data_found = FALSE;
char *result_start = result;
char header_found[10];
while (*data_ptr) {
// 1. Packet start
if (*data_ptr == GNSS_PACKET_START) {
on_header = TRUE;
header_char_index = 0; // to index each character in header
data_found = FALSE; // is data part found
data_ptr++;
}
// 2. For header parsing
if (on_header) {
if (*data_ptr == GNSS_TOKEN_SEPARATOR || header_char_index >= GNSS_HEADER_LENGTH) {
on_header = FALSE;
} else {
header_found[header_char_index] = *data_ptr;
if (header_char_index == GNSS_HEADER_LENGTH - 1) { // Now Header found
header_found[header_char_index + 1] = '\0';
on_header = FALSE;
if (!strcmp(header, header_found)) {
// Some headers may repeat - to identify it set the repeat index
if (header_repeat_counter == repeat_index) {
//printf("Header: %s\r\n", header_found );
data_found = TRUE;
}
header_repeat_counter++;
}
}
header_char_index++;
}
}
// 3. data found
if (data_found) {
// To get the index data separated by comma
if (counted_token_index == token_index && *data_ptr != GNSS_TOKEN_SEPARATOR) {
// the data to parse
*result++ = *data_ptr;
gnss_parsed_result = TRUE;
}
if (*data_ptr == GNSS_TOKEN_SEPARATOR) { // if ,
counted_token_index++; // The comma counter for index
}
// Break if the counted_token_index(token_counter) greater than token_index(search_token)
if (counted_token_index > token_index) {
break;
}
}
// Appending \0 to the end
*result = '\0';
// To trim the data if ends with \r or \n
str_trim(result_start);
// Input data
data_ptr++;
}
return gnss_parsed_result;
}
int main()
{
char res[100];
char *nem = "\
$GNRMC,080817.000,A,0852.089246,N,07636.289920,E,0.00,139.61,270322,,,A,V*04\r\n\\r\n\
$GNGGA,080817.000,0852.089246,N,07636.289920,E,1,5,1.41,11.246,M,-93.835,M,,*5E\r\n\
$GNVTG,139.61,T,,M,0.00,N,0.00,K,A*2F\r\n\
$GNGSA,A,3,30,19,17,14,13,,,,,,,,1.72,1.41,0.98,1*0A\r\n\
$GNGSA,A,3,,,,,,,,,,,,,1.72,1.41,0.98,3*02\r\n\
$GNGSA,A,3,,,,,,,,,,,,,1.72,1.41,0.98,6*07\r\n\
$GPGSV,3,1,12,06,64,177,,30,60,138,15,19,51,322,18,17,42,356,27,1*68\r\n\
$GPGSV,3,2,12,14,36,033,17,07,34,142,17,13,32,267,17,02,21,208,,1*6C\r\n\
$GPGSV,3,3,12,15,05,286,,01,05,037,,03,03,083,,20,02,208,,1*6B\r\n\
$GAGSV,1,1,00,7*73\r\n\
$GIGSV,1,1,00,1*7D\r\n\
$GNGLL,0852.089246,N,07636.289920,E,080817.000,A,A*43\r\n\
$PQTMANTENNASTATUS,1,0,1*4F\r\n";
printf("Parsing GNRMC\r\n");
printf("===============\r\n");
for(int i=1;i<=16;i++){
parse_gnss_token(nem, "GNRMC", 0, i, res);
printf("Index: %d, Result: %s\r\n", i, res);
}
printf("Parsing GNVTG (First Parameter)\r\n");
printf("================================");
// GNVTG - Header, 0 - Repeat Index(if header is repeating), 1 - Value Index,
parse_gnss_token(nem, "GNVTG", 0, 1, res);
printf("\r\nGNVTG: %s\r\n", res);
return 0;
}

Working example of substitution using PCRS

I need to to substitution in a string in C. It was recommended in one of the answers here How to do regex string replacements in pure C? to use the PCRS library. I downloaded PCRS from here ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib/ but I'm confused as to how to use it. Below is my code (taken from another SE post)
const char *error;
int erroffset;
pcre *re;
int rc;
int i;
int ovector[100];
char *regex = "From:([^#]+).*";
char str[] = "From:regular.expressions#example.com\r\n";
char stringToBeSubstituted[] = "gmail.com";
re = pcre_compile (regex, /* the pattern */
PCRE_MULTILINE,
&error, /* for error message */
&erroffset, /* for error offset */
0); /* use default character tables */
if (!re)
{
printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
return -1;
}
unsigned int offset = 0;
unsigned int len = strlen(str);
while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0)
{
for(int i = 0; i < rc; ++i)
{
printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
}
offset = ovector[1];
}
As opposed to 'pcre_compile' and 'pcre_exec' what functions do I need to use from PCRS?
Thanks.
Simply follow the instructions in the INSTALL file:
To build PCRS, you will need pcre 3.0 or later and gcc.
Installation is easy: ./configure && make && make install
Debug mode can be enabled with --enable-debug.
There is a simple demo application (pcrsed) included.
PCRS provides the following functions documented in the man page pcrs.3:
pcrs_compile
pcrs_compile_command
pcrs_execute
pcrs_execute_list
pcrs_free_job
pcrs_free_joblist
pcrs_strerror
Here's an online version of the man page. To use these functions, include the header file pcrs.h and link your program against the PCRS library using the linker flag -lpcrs.

Resources