I have to parse a game file that has this format:
ItemID = 3288 # This is a comments and begins with '#' character.
Name = "a magic sword"
Description = "It has some ancient runic inscriptions."
Flags = {MultiUse,Take,Weapon}
Attributes = {Weight=4200,WeaponType=1,WeaponAttackValue=48,WeaponDefendValue=35}
# A line can also begin with this character and it should be ignored.
and I have to parse it's data and put them into variables. I have tried many things, and I've been told that I will have to read the file line by line, then read each line character by character (so I'm able to read until '#' character) and then read the result word by word following the pattern. I have done this:
void ParseScriptFile(FILE* File) {
char Line[1024];
while (fgets(Line, sizeof(Line), File)) {
}
fclose(File);
}
I think I should read the lines inside the while loop but I don't know how would I read until # character is reached and if it does not exist just continue looping line through line. Is there an easy way to do this?
Use sscanf, like I did two for you
void ParseScriptFile(FILE* File) {
char Line[1024];
int ItemID; // variable to store ItemId
char name[40]; // string to store Name
while (fgets(Line, sizeof(Line), File)) {
sscanf(Line, "ItemID = %d", &ItemID);
sscanf(Line, "Name = %[^n]s", name); // ^n upto newline
}
printf("ItemId= %d\n", ItemID);
printf("Name= %s", name);
fclose(File);
}
Here's more or less workable code. Reading lines with fgets() is correct. You can then eliminate empty lines and comment lines trivially. If the line ends with a comment, you can convert the # into a null byte to ignore the comment. Then you need to scan for the entries name field (assume there are no spaces in the name part, to the left of the equals sign), and the = and the value on the right.
#include <stdio.h>
#include <string.h>
static
void ParseScriptFile(FILE *File)
{
char Line[1024];
while (fgets(Line, sizeof(Line), File))
{
if (Line[0] == '#' || Line[0] == '\n')
continue;
char *comment_start = strchr(Line, '#');
if (comment_start != NULL)
*comment_start = '\0';
char name[64];
char value[1024];
if (sscanf(Line, " %63s = %1023[^\n]", name, value) == 2)
printf("Name = [%s] Value = [%s]\n", name, value);
else
printf("Mal-formed line: [%s]\n", Line);
}
fclose(File);
}
int main(void)
{
ParseScriptFile(stdin);
return 0;
}
The program reads from standard input. An example run from your data file yielded:
Name = [ItemID] Value = [3288 ]
Name = [Name] Value = ["a magic sword"]
Name = [Description] Value = ["It has some ancient runic inscriptions."]
Name = [Flags] Value = [{MultiUse,Take,Weapon}]
Name = [Attributes] Value = [{Weight=4200,WeaponType=1,WeaponAttackValue=48,WeaponDefendValue=35}]
Note the space at the end of the ItemID value; there was a space before the # symbol.
If you need to handle strings that could themselves contain # symbols, you have to work harder (Curse = "You ###$%!" # Language, please!). Parsing an entry such as the value for Attributes is a separate task for a separate function (callable from this one). Indeed, you should be calling one or more functions to process each name/value pair. You probably also need some context passed to the ParseScriptFile() function so that the data can be saved appropriately. You wouldn't want to contaminate clean code with unnecessary global variables, would you?
Related
I have a dynamically updated text file with names of people, I want to parse the file to extract "Caleb" and the string that follows his name. However, his name may not always be in the list and I want to account for that.
I could do it in Java, but not even sure what to do in C. I could start by reading in the text file line by line, but then how would I check if "Caleb" is a substring of the string I just read in and handle the case when he isn't? I want to do this without using external libraries - what would be the best method?
Barnabas: Followed by a string
Bart: Followed by a string
Becky: Followed by a string
Bellatrix: Followed by a string
Belle: Followed by a string
Caleb: I want this string
Benjamin: Followed by a string
Beowul: Followed by a string
Brady: Followed by a string
Brick: Followed by a string
returns: "Caleb: I want this string" or "Name not found"
but then how would I check if "Caleb" is a substring of the string
The heart of the question as I read it. strstr does the job.
char *matchloc;
if ((matchloc = strstr(line, "Caleb:")) {
// You have a match. Code here.
}
However in this particular case you really want starts with Caleb, so we do better with strncmp:
if (!strncmp(line, "Caleb:", 6)) {
// You have a match. Code here.
}
So if you want to check if the user caleb exists, you can simple made a strstr, with your array of strings, and if exists you can make a strtok, to get only the string!
I dont know how you are opening the file, but you can use getline to get line by line!
You can do something like this:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
FILE *file;
char *fich="FILE.TXT";
char *line = NULL;
char *StringFile[100];
size_t len = 0;
ssize_t stringLength;
const char s[2] = ":"; //Divide string for this
char *token;
int check =0;
char *matchloc;
file=fopen(fich, "r");
if(file==NULL){
fprintf(stderr, "[ERROR]: cannot open file <%s> ", fich);
perror("");
exit(1);
}
while((stringLength = getline(&line, &len, file)) != -1){
if(line[strlen(line)-1] == '\n'){
line[strlen(line)-1] = '\0'; //Removing \n if exists
}
if((matchloc = strstr(line, "Caleb:"))){
check = 1;
strcpy(*StringFile, line);
token = strtok(*StringFile, s);
while( token != NULL ) {
token = strtok(NULL, s);
printf("%s\n", token);
break;
}
break;
}
}
if(check==0){
printf("Name not found\n");
}
return 0;
}
The code, can have some errors, but the idead is that! when founds the name, copy the line to array and the splits it.
I am trying to read each line of a file and store binary values into appropriate variables.
I can see that there are many many other examples of people doing similar things and I have spent two days testing out different approaches that I found but still having difficulties getting my version to work as needed.
I have a txt file with the following format:
in = 00000000000, out = 0000000000000000
in = 00000000001, out = 0000000000001111
in = 00000000010, out = 0000000000110011
......
I'm attempting to use fscanf to consume the unwanted characters "in = ", "," and "out = "
and keep only the characters that represent binary values.
My goal is to store the first column of binary values, the "in" values into one variable
and the second column of binary values, the "out" value into another buffer variable.
I have managed to get fscanf to consume the "in" and "out" characters but I have not been
able to figure out how to get it to consume the "," "=" characters. Additionally, I thought that fscanf should consume the white space but it doesn't appear to be doing that either.
I can't seem to find any comprehensive list of available directives for scanners, other than the generic "%d, %s, %c....." and it seems that I need a more complex combination of directives to filter out the characters that I'm trying to ignore than I know how to format.
I could use some help with figuring this out. I would appreciate any guidance you could
provide to help me understand how to properly filter out "in = " and ", out = " and how to store
the two columns of binary characters into two separate variables.
Here is the code I am working with at the moment. I have tried other iterations of this code using fgetc() in combination with fscanf() without success.
int main()
{
FILE * f = fopen("hamming_demo.txt","r");
char buffer[100];
rewind(f);
while((fscanf(f, "%s", buffer)) != EOF) {
fscanf(f,"%[^a-z]""[^,]", buffer);
printf("%s\n", buffer);
}
printf("\n");
return 0;
}
The outputs from my code appear as follows:
= 00000000000,
= 0000000000000000
= 00000000001,
= 0000000000001111
= 00000000010,
= 0000000000110011
Thank you for your time.
The scanf family function is said to be a poor man'parser because it is not very tolerant to input errors. But if you are sure of the format of the input data it allows for simple code. The only magic here if that a space in the format string will gather all blank characters including new lines or none. Your code could become:
int main()
{
FILE * f = fopen("hamming_demo.txt", "r");
if (NULL == f) { // always test open
perror("Unable to open input file");
return 1;
}
char in[50], out[50]; // directly get in and out
// BEWARE: xscanf returns the number of converted elements and never EOF
while (fscanf(f, " in = %[01], out = %[01]", in, out) == 2) {
printf("%s - %s\n", in, out);
}
printf("\n");
return 0;
}
So basically you want to filter '0' and '1'? In this case fgets and a simple loop will be enough: just count the number of 0's and 1's and null-terminate the string at the end:
#include <stdio.h>
int main(void)
{
char str[50];
char *ptr;
// Replace stdin with your file
while ((ptr = fgets(str, sizeof str, stdin)))
{
int count = 0;
while (*ptr != '\0')
{
if ((*ptr >= '0') && (*ptr <= '1'))
{
str[count++] = *ptr;
}
ptr++;
}
str[count] = '\0';
puts(str);
}
}
I am trying to read a textfile like this
1234567890 1234
9876543210 22
into a List struct in my program. I read in the files via fgets() and then use strtok to seperate the numbers, put them into variables and then finally into the List. However, I find that in doing this and printing the resulting strings, strtok always takes the final string in the final line to be NULL, thus resulting in a segmentation fault.
fgets(fileOutput,400,filePointer); //Read in a line from the file
inputPlate = strtok(fileOutput," "); // Take the first token, store into inputPlate
while(fileOutput != NULL)
{
string = strtok(NULL," ");
mileage = atoi(string); //Convert from string to integer and store into mileage
car = initializeCar(mileage,dateNULL,inputPlate);
avail->next = addList(avail->next,car,0);
fgets(fileOutput,400,filePointer);
inputPlate = strtok(fileOutput," ");
}
How do I resolve this?
Reading a text file line by line with fgets() is good.
Not checking the return value of fgets() is weak. This caused OP's code to process beyond the last line.
// Weak code
// fgets(fileOutput,400,filePointer); //Read in a line from the file
// ...
// while(fileOutput != NULL)
// {
Better to check the result of fgets() to determine when input is complete:
#define LINE_SIZE 400
...
while (fgets(fileOutput, LINE_SIZE, filePointer) != NULL)
{
Then process the string. A simple way to assess parsing success to is to append " %n" to a sscanf() format to record the offset of the scan.
char inputPlate[LINE_SIZE];
int mileage;
int n = -1;
sscanf(fileOutput, "%s%d %n", inputPlate, &mileage, &n);
// Was `n` not changed? Did scanning stop before the string end?
if (n < 0 || fileOutput[n] != '\0') {
Handle_Bad_input();
break;
} else {
car = initializeCar(mileage, dateNULL, inputPlate);
avail->next = addList(avail->next,car,0);
}
}
You could write a simpler parser with fscanf():
FILE *filePointer;
... // code not shown for opening the file, initalizing the list...
char inputPlate[32];
int mileage;
while (fscanf(filePointer, "%31s%d", inputPlate, &mileage) == 2) {
car = initializeCar(mileage, dateNULL, inputPlate);
avail->next = addList(avail->next, car, 0);
}
I have some C-code that reads in a text file line by line, hashes the strings in each line, and keeps a running count of the string with the biggest hash values.
It seems to be doing the right thing but when I issue the print statement:
printf("Found Bigger Hash:%s\tSize:%d\n", textFile.biggestHash, textFile.maxASCIIHash);
my print returns this in the output:
Preprocessing: dict1
Found BiSize:110h:a
Found BiSize:857h:aardvark
Found BiSize:861h:aardwolf
Found BiSize:937h:abandoned
Found BiSize:951h:abandoner
Found BiSize:1172:abandonment
Found BiSize:1283:abbreviation
Found BiSize:1364:abiogenetical
Found BiSize:1593:abiogenetically
Found BiSize:1716:absentmindedness
Found BiSize:1726:acanthopterygian
Found BiSize:1826:accommodativeness
Found BiSize:1932:adenocarcinomatous
Found BiSize:2162:adrenocorticotrophic
Found BiSize:2173:chemoautotrophically
Found BiSize:2224:counterrevolutionary
Found BiSize:2228:counterrevolutionist
Found BiSize:2258:dendrochronologically
Found BiSize:2440:electroencephalographic
Found BiSize:4893:pneumonoultramicroscopicsilicovolcanoconiosis
Biggest Size:46umonoultTotal Words:71885covolcanoconiosis
So tt seems I'm misusing printf(). Below is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define WORD_LENGTH 100 // Max number of characters per word
// data1 struct carries information about the dictionary file; preprocess() initializes it
struct data1
{
int numRows;
int maxWordSize;
char* biggestWord;
int maxASCIIHash;
char* biggestHash;
};
int asciiHash(char* wordToHash);
struct data1 preprocess(char* fileName);
int main(int argc, char* argv[]){
//Diagnostics Purposes; Not used for algorithm
printf("Preprocessing: %s\n",argv[1]);
struct data1 file = preprocess(argv[1]);
printf("Biggest Word:%s\t Size:%d\tTotal Words:%d\n", file.biggestWord, file.maxWordSize, file.numRows);
//printf("Biggest hashed word (by ASCII sum):%s\tSize: %d\n", file.biggestHash, file.maxASCIIHash);
//printf("**%s**", file.biggestHash);
return 0;
}
int asciiHash(char* word)
{
int runningSum = 0;
int i;
for(i=0; i<strlen(word); i++)
{
runningSum += *(word+i);
}
return runningSum;
}
struct data1 preprocess(char* fName)
{
static struct data1 textFile = {.numRows = 0, .maxWordSize = 0, .maxASCIIHash = 0};
textFile.biggestWord = (char*) malloc(WORD_LENGTH*sizeof(char));
textFile.biggestHash = (char*) malloc(WORD_LENGTH*sizeof(char));
char* str = (char*) malloc(WORD_LENGTH*sizeof(char));
FILE* fp = fopen(fName, "r");
while( strtok(fgets(str, WORD_LENGTH, fp), "\n") != NULL)
{
// If found a larger hash
int hashed = asciiHash(str);
if(hashed > textFile.maxASCIIHash)
{
textFile.maxASCIIHash = hashed; // Update max hash size found
strcpy(textFile.biggestHash, str); // Update biggest hash string
printf("Found Bigger Hash:%s\tSize:%d\n", textFile.biggestHash, textFile.maxASCIIHash);
}
// If found a larger word
if( strlen(str) > textFile.maxWordSize)
{
textFile.maxWordSize = strlen(str); // Update biggest word size
strcpy(textFile.biggestWord, str); // Update biggest word
}
textFile.numRows++;
}
fclose(fp);
free(str);
return textFile;
}
You forget to remove the \r after reading. This is in your input because (1) your source file comes from a Windows machine (or at least one which uses \r\n line endings), and (2) you use the fopen mode "r", which does not translate line endings on your OS (again, presumably Windows).
This results in the weird output as follows:
Found Bigger Hash:text\r\tSize:123
– see the position of the \r? So what happens when outputting this string, you get at first
Found Bigger Hash:text
and then the cursor gets repositioned to the start of the line by \r. Next, a tab is output – not by printing spaces but merely moving the cursor to the 8thth position:
1234567↓
Found Bigger Hash:text
and the rest of the string is printed over the one already shown:
Found BiSize:123h:text
Possible solutions:
Open your file in "rt" "text" mode, and/or
Check for, and remove, the \r code as well as \n.
I'd go for both. strchr is pretty cheap and will make your code a bit more foolproof.
(Also, please simplify your fgets line by splitting it up into several distinct operations.)
Your statement
while( strtok(fgets(str, WORD_LENGTH, fp), "\n") != NULL)
takes no account of the return value from fgets() or the way strtok() works.
The way to do this is something like
char *fptr, *sptr;
while ((fptr = fgets(str, WORD_LENGTH, fp)) != NULL) {
sptr = strtok(fptr, "\n");
while (sptr != NULL) {
printf ("%s,", sptr);
sptr = strtok (NULL, "\n");
}
printf("\n");
}
Note than after the first call to strtok(), subsequent calls on the same sequence must pass the parameter NULL.
I have an input file I need to extract words from. The words can only contain letters and numbers so anything else will be treated as a delimiter. I tried fscanf,fgets+sscanf and strtok but nothing seems to work.
while(!feof(file))
{
fscanf(file,"%s",string);
printf("%s\n",string);
}
Above one clearly doesn't work because it doesn't use any delimiters so I replaced the line with this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
So I used fgets to read the first line and use sscanf:
sscanf(line,"%[A-z]%n,word,len);
line+=len;
This one doesn't work either because whatever I try I can't move the pointer to the right place. I tried strtok but I can't find how to set delimitters
while(p != NULL) {
printf("%s\n", p);
p = strtok(NULL, " ");
This one obviously take blank character as a delimitter but I have literally 100s of delimitters.
Am I missing something here becasue extracting words from a file seemed a simple concept at first but nothing I try really works?
Consider building a minimal lexer. When in state word it would remain in it as long as it sees letters and numbers. It would switch to state delimiter when encountering something else. Then it could do an exact opposite in the state delimiter.
Here's an example of a simple state machine which might be helpful. For the sake of brevity it works only with digits. echo "2341,452(42 555" | ./main will print each number in a separate line. It's not a lexer but the idea of switching between states is quite similar.
#include <stdio.h>
#include <string.h>
int main() {
static const int WORD = 1, DELIM = 2, BUFLEN = 1024;
int state = WORD, ptr = 0;
char buffer[BUFLEN], *digits = "1234567890";
while ((c = getchar()) != EOF) {
if (strchr(digits, c)) {
if (WORD == state) {
buffer[ptr++] = c;
} else {
buffer[0] = c;
ptr = 1;
}
state = WORD;
} else {
if (WORD == state) {
buffer[ptr] = '\0';
printf("%s\n", buffer);
}
state = DELIM;
}
}
return 0;
}
If the number of states increases you can consider replacing if statements checking the current state with switch blocks. The performance can be increased by replacing getchar with reading a whole block of the input to a temporary buffer and iterating through it.
In case of having to deal with a more complex input file format you can use lexical analysers generators such as flex. They can do the job of defining state transitions and other parts of lexer generation for you.
Several points:
First of all, do not use feof(file) as your loop condition; feof won't return true until after you attempt to read past the end of the file, so your loop will execute once too often.
Second, you mentioned this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
That's not quite what's happening; if the next character in the stream doesn't match the format specifier, scanf returns without having read anything, and string is unmodified.
Here's a simple, if inelegant, method: it reads one character at a time from the input file, checks to see if it's either an alpha or a digit, and if it is, adds it to a string.
#include <stdio.h>
#include <ctype.h>
int get_next_word(FILE *file, char *word, size_t wordSize)
{
size_t i = 0;
int c;
/**
* Skip over any non-alphanumeric characters
*/
while ((c = fgetc(file)) != EOF && !isalnum(c))
; // empty loop
if (c != EOF)
word[i++] = c;
/**
* Read up to the next non-alphanumeric character and
* store it to word
*/
while ((c = fgetc(file)) != EOF && i < (wordSize - 1) && isalnum(c))
{
word[i++] = c;
}
word[i] = 0;
return c != EOF;
}
int main(void)
{
char word[SIZE]; // where SIZE is large enough to handle expected inputs
FILE *file;
...
while (get_next_word(file, word, sizeof word))
// do something with word
...
}
I would use:
FILE *file;
char string[200];
while(fscanf(file, "%*[^A-Za-z]"), fscanf(file, "%199[a-zA-Z]", string) > 0) {
/* do something with string... */
}
This skips over non-letters and then reads a string of up to 199 letters. The only oddness is that if you have any 'words' that are longer than 199 letters they'll be split up into multiple words, but you need the limit to avoid a buffer overflow...
What are your delimiters? The second argument to strtok should be a string containing your delimiters, and the first should be a pointer to your string the first time round then NULL afterwards:
char * p = strtok(line, ","); // assuming a , delimiter
printf("%s\n", p);
while(p)
{
p = strtok(NULL, ",");
printf("%S\n", p);
}