parsing a file while reading in c - c

I am trying to read each line of a file and store binary values into appropriate variables.
I can see that there are many many other examples of people doing similar things and I have spent two days testing out different approaches that I found but still having difficulties getting my version to work as needed.
I have a txt file with the following format:
in = 00000000000, out = 0000000000000000
in = 00000000001, out = 0000000000001111
in = 00000000010, out = 0000000000110011
......
I'm attempting to use fscanf to consume the unwanted characters "in = ", "," and "out = "
and keep only the characters that represent binary values.
My goal is to store the first column of binary values, the "in" values into one variable
and the second column of binary values, the "out" value into another buffer variable.
I have managed to get fscanf to consume the "in" and "out" characters but I have not been
able to figure out how to get it to consume the "," "=" characters. Additionally, I thought that fscanf should consume the white space but it doesn't appear to be doing that either.
I can't seem to find any comprehensive list of available directives for scanners, other than the generic "%d, %s, %c....." and it seems that I need a more complex combination of directives to filter out the characters that I'm trying to ignore than I know how to format.
I could use some help with figuring this out. I would appreciate any guidance you could
provide to help me understand how to properly filter out "in = " and ", out = " and how to store
the two columns of binary characters into two separate variables.
Here is the code I am working with at the moment. I have tried other iterations of this code using fgetc() in combination with fscanf() without success.
int main()
{
FILE * f = fopen("hamming_demo.txt","r");
char buffer[100];
rewind(f);
while((fscanf(f, "%s", buffer)) != EOF) {
fscanf(f,"%[^a-z]""[^,]", buffer);
printf("%s\n", buffer);
}
printf("\n");
return 0;
}
The outputs from my code appear as follows:
= 00000000000,
= 0000000000000000
= 00000000001,
= 0000000000001111
= 00000000010,
= 0000000000110011
Thank you for your time.

The scanf family function is said to be a poor man'parser because it is not very tolerant to input errors. But if you are sure of the format of the input data it allows for simple code. The only magic here if that a space in the format string will gather all blank characters including new lines or none. Your code could become:
int main()
{
FILE * f = fopen("hamming_demo.txt", "r");
if (NULL == f) { // always test open
perror("Unable to open input file");
return 1;
}
char in[50], out[50]; // directly get in and out
// BEWARE: xscanf returns the number of converted elements and never EOF
while (fscanf(f, " in = %[01], out = %[01]", in, out) == 2) {
printf("%s - %s\n", in, out);
}
printf("\n");
return 0;
}

So basically you want to filter '0' and '1'? In this case fgets and a simple loop will be enough: just count the number of 0's and 1's and null-terminate the string at the end:
#include <stdio.h>
int main(void)
{
char str[50];
char *ptr;
// Replace stdin with your file
while ((ptr = fgets(str, sizeof str, stdin)))
{
int count = 0;
while (*ptr != '\0')
{
if ((*ptr >= '0') && (*ptr <= '1'))
{
str[count++] = *ptr;
}
ptr++;
}
str[count] = '\0';
puts(str);
}
}

Related

File data split by ";" and then by "," in C

Basically I need to read the input from a file, the data on the file looks something like this:
A422TCRE234VD3KJ349D;2000,Gasoleo;Vermelho,;17,200,Platinum;17,200,45;
3KJ349DA422TCRE234VD;,diesel;Azul,Minivan;17,200,45;10,20,30;
DA422TC3KJ349RE234VD;,;Vermelho,;,,;,,;
Now what I need to do next is read this data and separate it by ";" which I'm doing using sscanf, like this:
sscanf(line," %[^;];%[^;];%[^;];%[^;];%[^;]", campos[0],campos[1], campos[2] ,campos[3],campos[4]);
"line" variable holds an entire line read from the file. All fine up until this point. Now I need to split each one of those arrays (campos[x]) by "," because I need to save each data on different structs I created. I was trying to achieve this by using an auxiliary array "output" where I combine all of the previous fields, like so:
strcpy(output,campos[1]);
strcat(output,",");
strcat(output,campos[2]);
strcat(output,",");
strcat(output,campos[3]);
strcat(output,",");
strcat(output,campos[4]);
Then I use sscanf again to try to split them:
sscanf(output, " %[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,]",helper[0],helper[1],helper[2],helper[3],helper[4],helper[5],helper[6],helper[7],helper[8], helper[9]);
But though luck, it's not working, probably because as you can see on the input file, some of the atributes are empty, the thing is I need them to stay empty, because afterwards I need to show data to the user in the same format as the input file.
I've tried using strtok(), and many variations of the code I have here, nothing seems to work.
Just so you have a better idea I'll write here the output I'm currently having:
puts(output); -> 2000,Gasoleo,Vermelho,,17,200,Platinum,17,200,45
,diesel,Azul,Minivan,17,200,45,10,20,30
,,Vermelho,,,,,,,
printf("%s,%s\n", helper[0],helper[1]); -> 2000,Gasole
,
,
Also so I can give a better perspective, this file represents work orders for a car factory, so each line is as follows:
Word Order Id;Horsepower,Fuel Type;Colour,Model;Diameter,Width,Colour;Diameter,Width,Height;
Example: Horsepower and Fuel Type are engine attributes, each ";" separates between car parts.
What can I try to fix this?
The fact that there are no spaces between the fields you want to parse makes it difficult to use sscanf or strtok for that matter.
Maybe a custom algorithm is the best way to go:
Live demo
int main() {
FILE *f = fopen("file.txt", "r");
if (f == NULL) {
perror("File");
}
else {
size_t i = 0, j = 0, it = 0;
char line[100];
char helper[12][100];
while (fgets(line, sizeof(line), f)) {
int c;
i = j = it = 0;
while ((c = line[it]) != '\n' && c != '\0'){ //read and assign cycle
if (c != ';' && c != ',') { //if character is not, or ; assign
helper[i][j++] = c;
}
else {
helper[i][j] = '\0'; //else terminate string
i++; //go to next helper line
j = 0; //reset column iterator
}
it++;
}
for (size_t a = 1; a < i; a++){ //test print
printf(",%s", helper[a]);
}
printf("\n");
}
}
}
Output:
,2000,Gasoleo,Vermelho,,17,200,Platinum,17,200,45
,,diesel,Azul,Minivan,17,200,45,10,20,30
,,,Vermelho,,,,,,,
As requested.

C How to properly fscanf with numbers and strings?

I'm writing a game in SDL2 for a school project, in C, I have a config that lists key-values pairs as such:
groundTiles: images/Overworld/groundTiles.png
and
cellHeight: 32
How should I go about parsing this data? Because my attempts result in the integers being read correctly but strings are either missing chars or are completely corrupt. I'm somewhat of a beginner to C, at least in terms of file i/o
I need another set of eyes on this code because I've spent too many hours on this already.
Could it have something to do with this struct in my header and how I'm using it to store temporary data?
typedef struct TileMapData_S
{
Uint32 col, row, cellWidth, cellHeight, numCells;
char *mapName;
char *emptyTileName;
Bool flag;
SDL_Color *colors;
Tile* tileTypes;
char *colorMap;
}TileMapData;
I've tried making it an unnamed struct in the function, then the source. No luck. I tried just not using a struct and fscanf'ing each piece of data into a separate variable. Same thing, no luck. If I did fscanf(file, "%s %s", buf, temp) with temp being the value of the key I'm parsing, then I get the first encounter of the string I'm looking for, then it copies itself to the other two char* that are holding the names of my sprites/files.
EDIT: This is my attempt based on comments, which does not work, any insight would be appreciated
while (!data->flag)
{
while (tempString != EOF)
{
tempString = strtok(buf, " \n");
if (strcmp(tempString, "width:") == 0)
{
tempString = strtok(buf, "\n\0 ");
map->numColumns = atoi(tempString);
continue;
}
.
.
.
if (strcmp(tempString, "groundTiles:") == 0)
{
data->mapName = strtok(buf, "\n\0 ");
data->mapName = tempString;
if (data->mapName != NULL)
{
data->flag = true;
}
else
{
data->flag = false;
}
continue;
}
.
.
.
tempString = fgets(buf, sizeof(buf), file);
slog(buf);
}
rewind(file);
}
I was expecting to get the string I wanted, without the whitespace/null-terminating char, but ended up with an infinite loop
END EDIT
I expect that when I parse groundTiles: images/Overworld/groundTiles.png
using fscanf(file, "%s", buf), doing strcmp on that and a known string (groundTiles:), then a second fscanf should provide the string images/Overworld/groundTiles.png

C - How to read a list of space-separated text file of numbers into a List

I am trying to read a textfile like this
1234567890 1234
9876543210 22
into a List struct in my program. I read in the files via fgets() and then use strtok to seperate the numbers, put them into variables and then finally into the List. However, I find that in doing this and printing the resulting strings, strtok always takes the final string in the final line to be NULL, thus resulting in a segmentation fault.
fgets(fileOutput,400,filePointer); //Read in a line from the file
inputPlate = strtok(fileOutput," "); // Take the first token, store into inputPlate
while(fileOutput != NULL)
{
string = strtok(NULL," ");
mileage = atoi(string); //Convert from string to integer and store into mileage
car = initializeCar(mileage,dateNULL,inputPlate);
avail->next = addList(avail->next,car,0);
fgets(fileOutput,400,filePointer);
inputPlate = strtok(fileOutput," ");
}
How do I resolve this?
Reading a text file line by line with fgets() is good.
Not checking the return value of fgets() is weak. This caused OP's code to process beyond the last line.
// Weak code
// fgets(fileOutput,400,filePointer); //Read in a line from the file
// ...
// while(fileOutput != NULL)
// {
Better to check the result of fgets() to determine when input is complete:
#define LINE_SIZE 400
...
while (fgets(fileOutput, LINE_SIZE, filePointer) != NULL)
{
Then process the string. A simple way to assess parsing success to is to append " %n" to a sscanf() format to record the offset of the scan.
char inputPlate[LINE_SIZE];
int mileage;
int n = -1;
sscanf(fileOutput, "%s%d %n", inputPlate, &mileage, &n);
// Was `n` not changed? Did scanning stop before the string end?
if (n < 0 || fileOutput[n] != '\0') {
Handle_Bad_input();
break;
} else {
car = initializeCar(mileage, dateNULL, inputPlate);
avail->next = addList(avail->next,car,0);
}
}
You could write a simpler parser with fscanf():
FILE *filePointer;
... // code not shown for opening the file, initalizing the list...
char inputPlate[32];
int mileage;
while (fscanf(filePointer, "%31s%d", inputPlate, &mileage) == 2) {
car = initializeCar(mileage, dateNULL, inputPlate);
avail->next = addList(avail->next, car, 0);
}

While loop through text file stops unexpectedly

I am trying to loop through a text file that contains random content. It's current contents are:
"13 -35 57 - 23723724
12taste-34the+56rain-bow845"
My program should only get the numbers from the file (-35 as a negative number, but not - 23723724 due to the space in between) and no letters or other characters unrelated to the integer.
Currently my code has a while loop that runs through the file and fetches all the decimal values. For some unknown reason however, it stops after 57 (total result is: "13-3557" and then it stops).
I have attempted to iterate over every character seperately but that brought along it's own set of problems and this method at least returns whole numbers.
Here is my code:
int *getIntegers(char *filename, int *pn) {
// Create a dynamic array
int len = 100;
int *numbers = malloc(sizeof(int) * len);
// Source file
FILE *file;
file = fopen(filename, "r");
int i = 0, number = 0;
while(fscanf(file, "%d", &number) > 0) {
numbers[i++] = number;
printf("%d", number);
}
return numbers;
}
EDIT:
I have changed my code and it now retrieves all the numbers, but no spaces.
// Create a dynamic array
int len = 100;
int *numbers = malloc(sizeof(int) * len);
// Source file
FILE *file;
file = fopen(filename, "r");
int i = 0, number = 0;
while(!feof(file)) {
if(fscanf(file, "%d ", &number) > 0) {
numbers[i++] = number;
} else {
clearerr(file);
fgetc(file);
}
}
fclose(file);
return numbers;
When the input stream encounters - and it expects to see an integer, it does not read anything. It stops there.
If you want to continue reading the rest of the numbers, you'll need some code that reads the next characters, discards it, and continues on.
while(!foeof(file) )
{
if ( fscanf(file, "%d", &number) > 0) {
numbers[i++] = number;
printf("%d", number);
else {
clearerr(file); // Clear the error state.
fgetc(file); // Read the next character and discard it.
}
}
Update
To add a space between the numbers in the output, use:
printf("%d ", number);
fscanf doesn't keep looking at its input until it finds something matching its patters. In this case, it encounters the lone -, and unable to parse it into an integer, returns zero. This breaks your loop. You will need to use EOF to break your loop instead.
It's because fscanf sees the lonely '-' and as that's not a valid number it cant parse it and returns 0 which causes your loop to end.
I suggest you use fgets to read the whole line, and then use strtok to separate on space, and strtol to convert the tokenized strings to numbers.

Reading a file in C

I have an input file I need to extract words from. The words can only contain letters and numbers so anything else will be treated as a delimiter. I tried fscanf,fgets+sscanf and strtok but nothing seems to work.
while(!feof(file))
{
fscanf(file,"%s",string);
printf("%s\n",string);
}
Above one clearly doesn't work because it doesn't use any delimiters so I replaced the line with this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
So I used fgets to read the first line and use sscanf:
sscanf(line,"%[A-z]%n,word,len);
line+=len;
This one doesn't work either because whatever I try I can't move the pointer to the right place. I tried strtok but I can't find how to set delimitters
while(p != NULL) {
printf("%s\n", p);
p = strtok(NULL, " ");
This one obviously take blank character as a delimitter but I have literally 100s of delimitters.
Am I missing something here becasue extracting words from a file seemed a simple concept at first but nothing I try really works?
Consider building a minimal lexer. When in state word it would remain in it as long as it sees letters and numbers. It would switch to state delimiter when encountering something else. Then it could do an exact opposite in the state delimiter.
Here's an example of a simple state machine which might be helpful. For the sake of brevity it works only with digits. echo "2341,452(42 555" | ./main will print each number in a separate line. It's not a lexer but the idea of switching between states is quite similar.
#include <stdio.h>
#include <string.h>
int main() {
static const int WORD = 1, DELIM = 2, BUFLEN = 1024;
int state = WORD, ptr = 0;
char buffer[BUFLEN], *digits = "1234567890";
while ((c = getchar()) != EOF) {
if (strchr(digits, c)) {
if (WORD == state) {
buffer[ptr++] = c;
} else {
buffer[0] = c;
ptr = 1;
}
state = WORD;
} else {
if (WORD == state) {
buffer[ptr] = '\0';
printf("%s\n", buffer);
}
state = DELIM;
}
}
return 0;
}
If the number of states increases you can consider replacing if statements checking the current state with switch blocks. The performance can be increased by replacing getchar with reading a whole block of the input to a temporary buffer and iterating through it.
In case of having to deal with a more complex input file format you can use lexical analysers generators such as flex. They can do the job of defining state transitions and other parts of lexer generation for you.
Several points:
First of all, do not use feof(file) as your loop condition; feof won't return true until after you attempt to read past the end of the file, so your loop will execute once too often.
Second, you mentioned this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
That's not quite what's happening; if the next character in the stream doesn't match the format specifier, scanf returns without having read anything, and string is unmodified.
Here's a simple, if inelegant, method: it reads one character at a time from the input file, checks to see if it's either an alpha or a digit, and if it is, adds it to a string.
#include <stdio.h>
#include <ctype.h>
int get_next_word(FILE *file, char *word, size_t wordSize)
{
size_t i = 0;
int c;
/**
* Skip over any non-alphanumeric characters
*/
while ((c = fgetc(file)) != EOF && !isalnum(c))
; // empty loop
if (c != EOF)
word[i++] = c;
/**
* Read up to the next non-alphanumeric character and
* store it to word
*/
while ((c = fgetc(file)) != EOF && i < (wordSize - 1) && isalnum(c))
{
word[i++] = c;
}
word[i] = 0;
return c != EOF;
}
int main(void)
{
char word[SIZE]; // where SIZE is large enough to handle expected inputs
FILE *file;
...
while (get_next_word(file, word, sizeof word))
// do something with word
...
}
I would use:
FILE *file;
char string[200];
while(fscanf(file, "%*[^A-Za-z]"), fscanf(file, "%199[a-zA-Z]", string) > 0) {
/* do something with string... */
}
This skips over non-letters and then reads a string of up to 199 letters. The only oddness is that if you have any 'words' that are longer than 199 letters they'll be split up into multiple words, but you need the limit to avoid a buffer overflow...
What are your delimiters? The second argument to strtok should be a string containing your delimiters, and the first should be a pointer to your string the first time round then NULL afterwards:
char * p = strtok(line, ","); // assuming a , delimiter
printf("%s\n", p);
while(p)
{
p = strtok(NULL, ",");
printf("%S\n", p);
}

Resources