Parsing code for GPS NMEA string - c

i am trying to parse the incoming GPGGA NMEA GPS string using Arduino uno and below code.
What i am trying to do is that i am using only GPGGA NMEA string to get the values of Latitude, longitude and altitude.In my below code, i had put certain checks to check if incoming string is GPGGA or not, and then store the further string in a array which can be further parsed suing strtok function and all the 3 GPS coordinates can be easily find out.
But i am unable to figure out how to store only GPGGA string and not the further string.I am using a for loop but it isn't working.
I am not trying to use any library.I had came across certain existing codes like this.
Here is the GPGGA string information link
i am trying to have following functionlity
i) Check if incoming string is GPGGA
ii) If yes, then store the following string upto EOL or upto * (followed by checksum for the array) in a array, array length is variable(i am unable to find out solution for this)
iii) Then parse the stored array(this is done, i tried this with a different array)
#include <SoftwareSerial.h>
SoftwareSerial mySerial(10,11); // 10 RX / 11 TX
void setup()
{
Serial.begin(9600);
mySerial.begin(9600);
}
void loop()
{
uint8_t x;
char gpsdata[65];
if((mySerial.available()))
{
char c = mySerial.read();
if(c == '$')
{char c1 = mySerial.read();
if(c1 == 'G')
{char c2 = mySerial.read();
if(c2 == 'P')
{char c3 = mySerial.read();
if(c3 == 'G')
{char c4 = mySerial.read();
if(c4 == 'G')
{char c5 = mySerial.read();
if(c5 == 'A')
{for(x=0;x<65;x++)
{
gpsdata[x]=mySerial.read();
while (gpsdata[x] == '\r' || gpsdata[x] == '\n')
{
break;
}
}
}
else{
Serial.println("Not a GPGGA string");
}
}
}
}
}
}
}
Serial.println(gpsdata);
}
Edit 1:
Considering Joachim Pileborg, editing the for loop in the code.
I am adding a pic to show the undefined output of the code.
Input for the code:
$GPGGA,092750.000,5321.6802,N,00630.3372,W,1,8,1.03,61.7,M,55.2,M,,*76
$GPGSA,A,3,10,07,05,02,29,04,08,13,,,,,1.72,1.03,1.38*0A
$GPGSV,3,1,11,10,63,137,17,07,61,098,15,05,59,290,20,08,54,157,30*70
$GPGSV,3,2,11,02,39,223,19,13,28,070,17,26,23,252,,04,14,186,14*79
$GPGSV,3,3,11,29,09,301,24,16,09,020,,36,,,*76
$GPRMC,092750.000,A,5321.6802,N,00630.3372,W,0.02,31.66,280511,,,A*43
$GPGGA,092751.000,5321.6802,N,00630.3371,W,1,8,1.03,61.7,M,55.3,M,,*75
$GPGSA,A,3,10,07,05,02,29,04,08,13,,,,,1.72,1.03,1.38*0A
$GPGSV,3,1,11,10,63,137,17,07,61,098,15,05,59,290,20,08,54,157,30*70
$GPGSV,3,2,11,02,39,223,16,13,28,070,17,26,23,252,,04,14,186,15*77
$GPGSV,3,3,11,29,09,301,24,16,09,020,,36,,,*76
$GPRMC,092751.000,A,5321.6802,N,00630.3371,W,0.06,31.66,280511,,,A*45

After a quick check of the linked article on the NMEA 0183 protocol, this jumped out at me:
<CR><LF> ends the message.
This means, that instead of just read indiscriminately from the serial port, you should be looking for that sequence. If found, you should terminate the string, and break out of the loop.
Also, you might want to zero-initialize the data string to begin with, to easily see if there actually is any data in it to print (using e.g. strlen).

You could use some functions from the C library libnmea. Theres functions to split a sentence into values by comma and then parse them.

Offering this as a suggestion in support of what you are doing...
Would it not be useful to replace all of the nested if()s in your loop with something like:
EDIT added global string to copy myString into once captured
char globalString[100];//declare a global sufficiently large to hold you results
void loop()
{
int chars = mySerial.available();
int i;
char *myString;
if (chars>0)
{
myString = calloc(chars+1, sizeof(char));
for(i=0;i<chars;i++)
{
myString[i] = mySerial.read();
//test for EOF
if((myString[i] == '\n') ||(myString[i] == '\r'))
{
//pick this...
myString[i]=0;//strip carriage - return line feed(or skip)
//OR pick this... (one or the other. i.e.,I do not know the requirements for your string)
if(i<chars)
{
myString[i+1] = mySerial.read() //get remaining '\r' or '\n'
myString[i+2]=0;//add null term if necessary
}
break;
}
}
if(strstr(myString, "GPGGA") == NULL)
{
Serial.println("Not a GPGGA string");
//EDIT
strcpy(globalString, "");//if failed, do not want globalString populated
}
else
{ //EDIT
strcpy(globalString, myString);
}
}
//free(myString) //somewhere when you are done with it
}
Now, the return value from mySerial.available() tells you exactly how many bytes to read, you can read the entire buffer, and test for validity all in one.

I have a project that will need to pull the same information out of the same sentence.
I got this out of a log file
import serial
import time
ser = serial.Serial(1)
ser.read(1)
read_val = ("nothing")
gpsfile="gpscord.dat"
l=0
megabuffer=''
def buffThis(s):
global megabuffer
megabuffer +=s
def buffLines():
global megabuffer
megalist=megabuffer.splitlines()
megabuffer=megalist.pop()
return megalist
def readcom():
ser.write("ati")
time.sleep(3)
read_val = ser.read(size=500)
lines=read_val.split('\n')
for l in lines:
if l.startswith("$GPGGA"):
if l[:len(l)-3].endswith("*"):
outfile=open('gps.dat','w')
outfile.write(l.rstrip())
outfile.close()
readcom()
while 1==1:
readcom()
answer=raw_input('not looping , CTRL+C to abort')
The result is this:
gps.dat
$GPGGA,225714.656,5021.0474,N,00412.4420,W,0,00,50.0,0.0,M,18.0,M,0.0,0000*5B

Using "malloc" every single time you read a string is an enormous amount of computational overhead. (And didn't see the corresponding free() function call. Without that, you never get that memory back until program termination or system runs out of memory.) Just pick the size of the longest string you will ever need, add 10 to it, and declare that your string array size. Set once and done.
There are several C functions for getting substrings out of a string, strtok() using the coma is probably the least overhead.
You are on an embedded microcontroller. Keep it small, keep overhead down. :)

#include <stdio.h>
#include <string.h>
#define GNSS_HEADER_LENGTH 5
#define GNSS_PACKET_START '$'
#define GNSS_TOKEN_SEPARATOR ','
#define bool int
#define FALSE 0
#define TRUE 1
//To trim a string contains \r\n
void str_trim(char *str){
while(*str){
if(*str == '\r' || *str == '\n'){
*str = '\0';
}
str++;
}
}
/**
* To parse GNSS data by header and the index separated by comma
*
* $GPGSV,1,1,03,23,39,328,30,18,39,008,27,15,33,035,33,1*5A
* $GNRMC,170412.000,V,,,,,,,240322,,,N,V*2D
* $GNGGA,170412.000,,,,,0,0,,,M,,M,,*57
*
* #data_ptr the pointer points to gps data
* #header the header for parsing GPGSV
* #repeat_index the header may repeat for many lines
* so the header index is for identifying repeated header
* #token_index is the index of the parsing data separated by ","
* the start is 1
* #result to store the result of the parser input
*
* #result bool - parsed successfully
**/
bool parse_gnss_token(char *data_ptr, char *header, int repeat_index, int token_index, char *result) {
bool gnss_parsed_result = FALSE; // To check GNSS data parsing is success
bool on_header = FALSE;
// For header
int header_repeat_counter = 0;
int header_char_index = 0; // each char in header index
// For counting comma
int counted_token_index = 0;
// To hold the result character index
bool data_found = FALSE;
char *result_start = result;
char header_found[10];
while (*data_ptr) {
// 1. Packet start
if (*data_ptr == GNSS_PACKET_START) {
on_header = TRUE;
header_char_index = 0; // to index each character in header
data_found = FALSE; // is data part found
data_ptr++;
}
// 2. For header parsing
if (on_header) {
if (*data_ptr == GNSS_TOKEN_SEPARATOR || header_char_index >= GNSS_HEADER_LENGTH) {
on_header = FALSE;
} else {
header_found[header_char_index] = *data_ptr;
if (header_char_index == GNSS_HEADER_LENGTH - 1) { // Now Header found
header_found[header_char_index + 1] = '\0';
on_header = FALSE;
if (!strcmp(header, header_found)) {
// Some headers may repeat - to identify it set the repeat index
if (header_repeat_counter == repeat_index) {
//printf("Header: %s\r\n", header_found );
data_found = TRUE;
}
header_repeat_counter++;
}
}
header_char_index++;
}
}
// 3. data found
if (data_found) {
// To get the index data separated by comma
if (counted_token_index == token_index && *data_ptr != GNSS_TOKEN_SEPARATOR) {
// the data to parse
*result++ = *data_ptr;
gnss_parsed_result = TRUE;
}
if (*data_ptr == GNSS_TOKEN_SEPARATOR) { // if ,
counted_token_index++; // The comma counter for index
}
// Break if the counted_token_index(token_counter) greater than token_index(search_token)
if (counted_token_index > token_index) {
break;
}
}
// Appending \0 to the end
*result = '\0';
// To trim the data if ends with \r or \n
str_trim(result_start);
// Input data
data_ptr++;
}
return gnss_parsed_result;
}
int main()
{
char res[100];
char *nem = "\
$GNRMC,080817.000,A,0852.089246,N,07636.289920,E,0.00,139.61,270322,,,A,V*04\r\n\\r\n\
$GNGGA,080817.000,0852.089246,N,07636.289920,E,1,5,1.41,11.246,M,-93.835,M,,*5E\r\n\
$GNVTG,139.61,T,,M,0.00,N,0.00,K,A*2F\r\n\
$GNGSA,A,3,30,19,17,14,13,,,,,,,,1.72,1.41,0.98,1*0A\r\n\
$GNGSA,A,3,,,,,,,,,,,,,1.72,1.41,0.98,3*02\r\n\
$GNGSA,A,3,,,,,,,,,,,,,1.72,1.41,0.98,6*07\r\n\
$GPGSV,3,1,12,06,64,177,,30,60,138,15,19,51,322,18,17,42,356,27,1*68\r\n\
$GPGSV,3,2,12,14,36,033,17,07,34,142,17,13,32,267,17,02,21,208,,1*6C\r\n\
$GPGSV,3,3,12,15,05,286,,01,05,037,,03,03,083,,20,02,208,,1*6B\r\n\
$GAGSV,1,1,00,7*73\r\n\
$GIGSV,1,1,00,1*7D\r\n\
$GNGLL,0852.089246,N,07636.289920,E,080817.000,A,A*43\r\n\
$PQTMANTENNASTATUS,1,0,1*4F\r\n";
printf("Parsing GNRMC\r\n");
printf("===============\r\n");
for(int i=1;i<=16;i++){
parse_gnss_token(nem, "GNRMC", 0, i, res);
printf("Index: %d, Result: %s\r\n", i, res);
}
printf("Parsing GNVTG (First Parameter)\r\n");
printf("================================");
// GNVTG - Header, 0 - Repeat Index(if header is repeating), 1 - Value Index,
parse_gnss_token(nem, "GNVTG", 0, 1, res);
printf("\r\nGNVTG: %s\r\n", res);
return 0;
}

Related

Waiting for character in string

I am currently working on a project that will be used to test whether an instrument is within tolerance or not. My test equipment will put the DUT (Device Under Test) into a "Test Mode" where it will repeatedly send a string of data every 200ms. I want to receive that data, check is is within tolerance and give it a pass or fail.
My code so far (I've edited a few things out like .h files and some work related bits!):
void GetData();
void CheckData();
char Data[100];
int deviceId;
float a;
float b;
float c;
void ParseString(const char* stringValue)
{
char* token = NULL;
int tokenPlace = 0;
token = strtok((char *) stringValue, ",");
while (token != NULL) {
switch (tokenPlace) {
case 0:
deviceId = atoi(token);
break;
case 1:
a= ((float)atoi(token)) / 10.0f;
break;
case 2:
b= ((float)atoi(token)) / 100.0f;
break;
case 3:
c= ((float)atoi(token)) / 10.0f;
break;
}
tokenPlace++;
token = strtok(NULL, ",");
}
}
void GetData()
{
int x = UART.scanf("%s,",Data);
ParseString(Data);
if (x !=0) {
UART.printf("Device ID = %i\n\r", deviceId);
UART.printf("a= %.1f\n\r", a);
UART.printf("s= %.2f\n\r", b);
UART.printf("c= %.1f\n\n\r", c);
}
if (deviceId <= 2) {
CheckData();
} else {
pc.printf("Device ID not recognised\n\n\r");
}
}
void CheckData()
{
if (a >= 49.9f && a< = 50.1f) {
pc.printf("a Pass\n\r");
} else {
pc.printf("a Fail\n\r");
}
if (b >= 2.08f && b <= 2.12f) {
pc.printf("b Pass\n\r");
} else {
pc.printf("b Fail\n\r");
}
if (c >= 20.0f && c <= 25.0f) {
pc.printf("c Pass\n\n\r");
} else {
pc.printf("c Fail\n\n\r");
}
if (deviceId == 0) {
(routine1);
} else if (deviceId == 1) {
(routine2);
} else if (deviceId == 2) {
(Routine3);
}
}
int main()
{
while(1) {
if(START == 0) {
wait(0.1);
GetData();
}
}
}
And this works absolutely fine. I am only printing the results to a serial terminal so I can check the data is correct to make sure it is passing and failing correctly.
My issue is every now and then the START button happens to be pressed during the time the string is sent and the data can be corrupt, so the deviceId fails and it will say not recognised. This means I then have to press the start button again and have another go. A the moment, it's a rare occurrence but I'd like to get rid of it if possible. I have tried adding a special character at the beginning of the string but this again gets missed sometimes.
Ideally, when the start button is pressed, I would like it to wait for this special character so it knows it is at the beginning of the string, then the data would be read correctly, but I am unsure how to go about it.
I have been unsuccessful in my attempts so far but I have a feeling I am overthinking it and there is a nice easy way to do it. Probably been staring at it too long now!
My microcontroller is STM32F103RB and I am using the STM Nucleo with the mBed IDE as it's easy and convenient to test the code while I work on it.
You can use ParseString to return a status indicating whether a complete string is read or not.
int ParseString(const char* stringValue)
{
/* ... your original code ... */
/* String is complete if 4 tokens are read */
return (tokenPlace == 4);
}
Then in GetData use the ParseString return value to determine whether to skip the string or not.
void GetData()
{
int x = UART.scanf("%s,",Data);
int result = ParseString(Data);
if (!result) {
/* Did not get complete string - just skip processing */
return;
}
/* ... the rest of your original code ... */
}

Printing an array of structs in C

I'm trying to print an array of structs that contain two strings. However my print function does not print more than two indices of the array. I am not sure why because it seems to me that the logic is correct.
This is the main function
const int MAX_LENGTH = 1024;
typedef struct song
{
char songName[MAX_LENGTH];
char artist[MAX_LENGTH];
} Song;
void getStringFromUserInput(char s[], int maxStrLength);
void printMusicLibrary(Song library[], int librarySize);
void printMusicLibraryTitle(void);
void printMusicLibrary (Song library[], int librarySize);
void printMusicLibraryEmpty(void);
int main(void) {
// Announce the start of the program
printf("%s", "Personal Music Library.\n\n");
printf("%s", "Commands are I (insert), S (sort by artist),\n"
"P (print), Q (quit).\n");
char response;
char input[MAX_LENGTH + 1];
int index = 0;
do {
printf("\nCommand?: ");
getStringFromUserInput(input, MAX_LENGTH);
// Response is the first character entered by user.
// Convert to uppercase to simplify later comparisons.
response = toupper(input[0]);
const int MAX_LIBRARY_SIZE = 100;
Song Library[MAX_LIBRARY_SIZE];
if (response == 'I') {
printf("Song name: ");
getStringFromUserInput(Library[index].songName, MAX_LENGTH);
printf("Artist: ");
getStringFromUserInput(Library[index].artist, MAX_LENGTH);
index++;
}
else if (response == 'P') {
// Print the music library.
int firstIndex = 0;
if (Library[firstIndex].songName[firstIndex] == '\0') {
printMusicLibraryEmpty();
} else {
printMusicLibraryTitle();
printMusicLibrary(Library, MAX_LIBRARY_SIZE);
}
This is my printing the library function
// This function will print the music library
void printMusicLibrary (Song library[], int librarySize) {
printf("\n");
bool empty = true;
for (int i = 0; (i < librarySize) && (!empty); i ++) {
empty = false;
if (library[i].songName[i] != '\0') {
printf("%s\n", library[i].songName);
printf("%s\n", library[i].artist);
printf("\n");
} else {
empty = true;
}
}
}
I think the problem is caused due to setting : empty = true outside the for loop and then checking (!empty) which will evaluate to false. What I am surprised by is how is it printing even two indices. You should set empty = false as you are already checking for the first index before the function call.
The logic has two ways to terminate the listing: 1) if the number of entries is reached, or 2) if any entry is empty.
I expect the second condition is stopping the listing before you expect. Probably the array wasn't built as expected (I didn't look at that part), or something is overwriting an early or middle entry.
you gave the definition as:
typedef struct song
{
char songName[MAX_LENGTH];
char artist[MAX_LENGTH];
}Song;
the later, you write if (library[i].songName[i] != '\0') which really seems strange: why would you index the songname string with the same index that the lib?
so I would naturally expect your print function to be:
// This function will print the music library
void printMusicLibrary (Song library[], int librarySize) {
for (int i = 0; i < librarySize; i ++) {
printf("%s\n%s\n\n", library[i].songName,
library[i].artist);
}
}
note that you may skip empty song names by testing library[i].songName[0] != '\0' (pay attention to the 0), but I think it would be better not to add them in the list (does an empty song name make sens?)
(If you decide to fix that, note that you have an other fishy place: if (Library[firstIndex].songName[firstIndex] == '\0') with the same pattern)

How to store chars read from serial port into buffer for processing?

Right now I have some code that is printing out data read from a serial port with putch(out). But I need to store it into an array and process it to get floating point values out.
Here is my code just for talking to the serial port, with the calculations omitted:
#include <bios.h>
#include <conio.h>
#define COM1 0
#define DATA_READY 0x100
// set the baud rate, parity bit, data width...
#define SETTINGS ( 0xE0 | 0x03 | 0x00 | 0x00)
int main(void)
{
int in, out, status,i;
char dataRead[21];
float roll, pitch;
bioscom(0, SETTINGS, COM1); /*initialize the port*/
clrscr();
cprintf("Data sent to you: ");
while (1)
{
status = bioscom(3, 0, COM1); // reading the data here
if (status & DATA_READY)
if ((out = bioscom(2, 0, COM1) & 0 7F) != 0)
putch(out); // printing read value
if (kbhit( )) // If Esc is hit. it breaks and exit.
{
if ((in = getch( )) == ‘ 1B’) // 1B = Esc
DONE = TRUE;
bioscom(1, in, COM1); // data write. I am not making use of it.
}
}
return 0;
}
The incoming data encodes the roll and pitch, as something like "R:+XXX.XX P:-YYY.YY\r\n".
Instead of just printing this data out, I want to store it in the dataRead[] array and interpret it into float values. For instance, dataRead[2] to dataRead[8] encodes a float value for the "roll" as characters +XXX.XX.
How do I store these characters in the array and get the floating point number from it?
If you could write down some code for serial port which does exactly what i want then it would be really helpful. Please make sure it is written in 'C'.
I am not familiar with Dev C++ but in C++ you have the strtod function which converts string to double (float). You need to:
collect all characters from the serial port in one text line
parse the line
First is easy, just wait until reach "\n". Something like:
char line [MAX_LINE];
if(out == '\n')
parse_line(line. ....)
else
line[n_readed++] = out;
The second part is more tricky. You can use some text processing library for parsing or write your
own parse function using strtod. Since your case is simple I would rather do the second. Here is an example of the parse function which reads your text lines:
const char* line = "R:+123.56P:-767.77\r\n";
// return true on success
bool parse_line(const char* line, double* R, double* P)
{
const char* p = line;
if(*p++ != 'R')
return false;
if(*p++ != ':')
return false;
if(*p == '+') // + sign is optional
p++ ;
char* pEnd;
*R = strtod (p, &pEnd);
if(pEnd == p)
return false;
p = pEnd;
if(*p++ != 'P')
return false;
if(*p++ != ':')
return false;
*P = strtod (p, &pEnd);
if(pEnd == p)
return false;
return true;
}
int main()
{
double R,P;
if(!parse_line(line, &R, &P))
printf("error\n");
else
printf("sucessfully readed: %f\t%f\n", R,P);
return 0;
}
You need to declare two float arrays and to fill them with the parsed float values when the parsing function returns true. Otherwise there have been some error, probably because of damaged data. Also you can change all doubles with floats.
I hope this to help you.

The execution of the code always goes into the else statement

Some very strange things happen in my source code.
The following function works well and it prints 'y' when the password is correct and prints 'n' when it is incorrect. But if i add some UART1_Write and Delay functions to the else statement the bug comes out and even if the password is "zxc" (correct) it ALWAYS enters the else statement.
I'm using MikroC PRO for PIC v6.0.0, the robot system is made of PIC18F452 and RN-42 bluetooth module connected to it. I am testing with a laptop with a bluetooth and TeraTerm.
For more info: http://instagram.com/p/pLnU9eDL8z/#
Here it is the well working routine:
void authenticate() {
char *input = "";
char *password = "zxc\0";
unsigned char ready = 0;
while (connected && !ready) {
if (UART1_Data_Ready()) {
UART1_Read_Text(input, "|", 17);
strcat(input, "\0");
if (strcmp(input, password) == 0) {
UART1_Write('y');
ready = 1;
} else {
UART1_Write('n');
ready = 1;
}
}
}
}
This version of the routine ALWAYS goes in the ELSE statement of the strcmp(input, password) == 0 part:
void authenticate() {
char *input = "";
char *password = "zxc\0";
unsigned char ready = 0;
while (connected && !ready) {
if (UART1_Data_Ready()) {
UART1_Read_Text(input, "|", 17);
strcat(input, "\0");
if (strcmp(input, password) == 0) {
UART1_Write('y');
ready = 1;
} else {
UART1_Write('n');
Delay_ms(100);
UART1_Write('$');
Delay_ms(100);
UART1_Write('$');
Delay_ms(100);
UART1_Write('$');
Delay_ms(100);
UART1_Write('K');
Delay_ms(100);
UART1_Write(',');
Delay_ms(100);
UART1_Write('-');
Delay_ms(100);
UART1_Write('-');
Delay_ms(100);
UART1_Write('-');
Delay_ms(100);
UART1_Write('\n');
ready = 1;
}
}
}
}
It is important to send all these addition symbols in order to get RN-42 into command mode and disconnect the user if the password is wrong.
Please help me solve the problem. Any ideas appreciated!
As others have pointed out in the comments section a major issue with your code is that you are trying to store the UART data to memory that does not belong to you.
When you declare char *input = "";, you haven't actually allocated any space except for a single byte that stores '\0'. Then, when you use UART1_Read_Text(), you tell that function you may have up to 17 characters that will be read before finding the delimiter - all of which should be stored at the location pointed to by input.
The description of that library function can be found here. Also, based on the library description it looks like UART1_Read_Text() already adds the null-termination to the UART data. I base this assumption off the description of UARTx_Write_Text and the example that they provide on their website. However, I would recommend that you verify that is indeed the case.
Also, your initialization of password is redundant and char *password = "zxc\0" should be changed to char *password = "zxc". When you declare a string literal using double quotation marks it is automatically null-terminated. This excerpt is from "C in a Nutshell":
A string literal consists of a sequence of characters (and/or escape sequences) enclosed in double quotation marks... A string literal is a static array of char that contains character codes followed by a string terminator, the null character \0... The empty string "" occupies exactly one byte in memory, which holds the terminating null character.
Based on the above, I would go about it a little more like this:
#define MAX_NUM_UART_RX_CHARACTERS 17
void authenticate()
{
char input[MAX_NUM_UART_RX_CHARACTERS + 1];
char *password = "zxc";
unsigned char ready = 0;
while (connected && !ready)
{
if (UART1_Data_Ready())
{
UART1_Read_Text(input, "|", MAX_NUM_UART_RX_CHARACTERS);
if (strcmp(input, password) == 0)
{
UART1_Write('y');
ready = 1;
}
else
{
UART1_Write('n');
ready = 1;
}
}
}
}

LZW Decompression in C

I have an LZW compressor/decompressor written in C.
The initial table consists of ASCII characters and then each now string to be saved into the table consists of a prefix and a character both saved in a list as int.
My compression works but my decompression leaves some characters out.
The input:
<title>Agile</title><body><h1>Agile</h1></body></html>
The output I get (notice the missing 'e' and '<'):
<title>Agile</title><body><h1>Agil</h1></body>/html>
This is the code I use (the relevant part):
void expand(int * input, int inputSize) {
// int prevcode, currcode
int previousCode; int currentCode;
int nextCode = 256; // start with the same dictionary of 255 characters
dictionaryInit();
// prevcode = read in a code
previousCode = input[0];
int pointer = 1;
// while (there is still data to read)
while (pointer < inputSize) {
// currcode = read in a code
currentCode = input[pointer++];
if (currentCode >= nextCode) printf("!"); // XXX not yet implemented!
currentCode = decode(currentCode);
// add a new code to the string table
dictionaryAdd(previousCode, currentCode, nextCode++);
// prevcode = currcode
previousCode = currentCode;
}
}
int decode(int code) {
int character; int temp;
if (code > 255) { // decode
character = dictionaryCharacter(code);
temp = decode(dictionaryPrefix(code)); // recursion
} else {
character = code; // ASCII
temp = code;
}
appendCharacter(character); // save to output
return temp;
}
Can you spot it? I'd be grateful.
Your decode function returns the first character in the string. You need this character in order to add it to the dictionary, but you should not set previousCode to it. So your code should look like:
...
firstChar = decode(currentCode);
dictionaryAdd(previousCode, firstChar, nextCode++);
previousCode = currentCode;
...

Resources