how to parse plaintext from file into a 2d matrix/array? - c

the following code that I wrote is supposed to transform a line taken from a file like the following:
(3670, 1882) (1574, 7255) (4814, 8566) (1609, 3153) (9725, 13468) (8297, 3006) (9091, 6989) (8521, 10432) (14669, 12201) (4203, 9729) (469, 2444) (10107, 8318) (1848, 13650) (5423, 847) (11755, 8827) (4451, 4495) (11645, 1670) (10937, 5692) (14533, 13696) (7291, 12158) (1891, 2405) (1776, 4971) (2486, 2499) (13389, 236) (8533, 7531) (10618, 10288) (9119, 11226) (9429, 6622) (12380, 9516) (1698, 5828) (8369, 5101) (11341, 13530) (11955, 2335) (6249, 14435) (9373, 6921) (2977, 2294) (57, 14558) (280, 12847) (13846, 11748) (428, 9004)
into a valid 2d matrix.
{{3670, 1882},{1547, 7255}...}
I'm a good "pythoner" and I'd be able to do that in one line. I wanted to try to solve the same problem in c (note that I started to mess around with c today); my attempt is the following (and the result is quite random/wrong):
FILE *fp;
fp=fopen(argv[1], "rt");
if ( fp != NULL )
{
char line [1000]; //this is ugly, isn't this?
while ( fgets ( line, sizeof line, fp ) != NULL ) // read a line
{
line[(strlen(line)-1)] = '\0';
//line[(strlen(line)-2)] = '\0';
char* p;
p = strtok(line, ",)( ");
int elements[100][2]; //even uglier than before?
int binpos=0;
int pos=0;
while (p != NULL)
{
if (p!=NULL){
if (binpos==0){
elements[pos][binpos]=atoi(p);
binpos=1;
}else{
p[(strlen(p)-1)] = '\0'; //remove the comma
elements[pos][binpos]=atoi(p);
pos++;
binpos=0;
}
}
p = strtok(NULL, ",)( ");
}
int it;
for (it=0; it<pos; it++){
printf("(%d, %d)\n",elements[it][0],elements[it][1]);
}
return 0;
}
}
Can someone please tell me how to correct my mess? :)

You should only be using fgets to read lines if lines are relevant to what you are parsing. In your case it looks like line endings are irrelevant, so you're better off just using scanf:
printf("{");
const char *sep = "";
int a, b;
while (fscanf(fp, "(%d,%d)", &a, &b) == 2) {
printf("%s{%d, %d}", sep, a, b);
sep = ", "; }
printf("}");

Related

C - Program not detecting blank spaces

I want my program to read a file containing words separated by blank spaces and then prints words one by one. This is what I did:
char *phrase = (char *)malloc(LONGMAX * sizeof(char));
char *mot = (char *)malloc(TAILLE * sizeof(char));
FILE *fp = NULL;
fp = fopen("mots.txt", "r");
if (fp == NULL) {
printf("err ");
} else {
fgets(phrase, LONGMAX, fp);
while (phrase[i] != '\0') {
if (phrase[i] != " ") {
mot[m] = phrase[i];
i++;
m++;
} else {
printf("%s\n", phrase[i]);
mot = "";
}
}
}
but it isn't printing anything! Am I doing something wrong? Thanks!
The i in the following:
while (phrase[i]!='\0'){
Should be initialized to 0 before being used, then incremented as you iterate through the string.
You have not shown where/how it is created.
Also in this line,
if(phrase[i]!=" "){
the code is comparing a char: (phrase[i]) with a string: ( " " )
// char string
if(phrase[i] != " " ){
change it to:
// char char
if(phrase[i] != ' '){
//or better yet, include all whitespace:
if(isspace(phrase[i]) {
There is no error checking in the following, but it is basically your code with modifications. Read comments for explanation on edits to fgets() usage, casting return of malloc(), how and when to terminate the output buffer mot, etc.:
This performs the following: read a file containing words separated by blank spaces and then prints words one by one.
int main(void)
{
int i = 0;
int m = 0;
char* phrase=malloc(LONGMAX);//sizeof(char) always == 0
if(phrase)//test to make sure memory created
{
char* mot=malloc(TAILLE);//no need to cast the return of malloc in C
if(mot)//test to make sure memory created
{
FILE* fp=NULL;
fp=fopen("_in.txt","r");
if(fp)//test to make sure fopen worked
{//shortcut of what you had :) (left off the print err)
i = 0;
m = 0;
while (fgets(phrase,LONGMAX,fp))//fgets return NULL when no more to read.
{
while(phrase[i] != NULL)//test for end of last line read
{
// if(phrase[i] == ' ')//see a space, terminate word and write to stdout
if(isspace(phrase[i])//see ANY white space, terminate and write to stdout
{
mot[m]=0;//null terminate
if(strlen(mot) > 0) printf("%s\n",mot);
i++;//move to next char in phrase.
m=0;//reset to capture next word
}
else
{
mot[m] = phrase[i];//copy next char into mot
m++;//increment both buffers
i++;// "
}
}
mot[m]=0;//null terminate after while loop
}
//per comment about last word. Print it out here.
mot[m]=0;
printf("%s\n",mot);
fclose(fp);
}
free(mot);
}
free(phrase);
}
return 0;
}
phrase[i]!=" "
You compare character (phrase[i]) and string (" "). If you want to compare phrase[i] with space character, use ' ' instead.
If you want to compare string, use strcmp.
printf("%s\n",phrase[i]);
Here, you use %s for printing the string, but phrase[i] is a character.
Do not use mot=""; to copy string in c. You should use strcpy:
strcpy(mot, " ");
If you want to print word by word from one line of string. You can use strtok to split string by space character.
fgets(phrase,LONGMAX,fp);
char * token = strtok(phrase, " ");
while(token != NULL) {
printf("%s \n", token);
token = strtok(NULL, " ");
}
OT, your program will get only one line in the file because you call only one time fgets. If your file content of many line, you should use a loop for fgets function.
while(fgets(phrase,LONGMAX,fp)) {
// do something with pharse string.
// strtok for example.
char * token = strtok(phrase, " ");
while(token != NULL) {
printf("%s \n", token);
token = strtok(NULL, " ");
}
}
Your program has multiple problems:
the test for end of file is incorrect: you should just compare the return value of fgets() with NULL.
the test for spaces is incorrect: phrase[i] != " " is a type mismatch as you are comparing a character with a pointer. You should use isspace() from <ctype.h>
Here is a much simpler alternative that reads one byte at a time, without a line buffer nor a word buffer:
#include <ctype.h>
#include <stdio.h>
int main() {
int inword = 0;
int c;
while ((c = getchar()) != EOF) {
if (isspace(c)) {
if (inword) {
putchar('\n');
inword = 0;
}
} else {
putchar(c);
inword = 1;
}
}
if (inword) {
putchar('\n');
}
return 0;
}

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

Reading data separated by ';' from file

I am trying to read in data from a file that is formatted with ;.
The data will always be of like this:
char[];int;int%;int
The char[] can have any number of spaces and the % should be disregarded when reading the data.
I am using fscanf() (I am allowed to use only that) for reading the data from the file.
Right now my code for that part of it is:
fscanf(file, "%[^;]%d%d%d", f_name, &f_id, &f_score, &f_section_num) != EOF)
Is there a regex for what I need? Or, how do I correct my fscanf?
You can read the file using fscanf with this format string:
"%[^;];%d;%d%%;%d"
%[^;]: read up to first ;
;: ignore the ;
%d: read one integer
;: ignore the ;
%d: read one second integer
%%: ignore the %
;: ignore the ;
%d: read one third integer
Do not forget to test the number of successful conversions made by fscanf by testing fscanf(...) == 4
So code will looks like:
FILE *f = fopen(...);
char name[64];
int i, integers[3];
while (fscanf(f, "%[^;];%d;%d%%;%d", name, &integers[0], &integers[1], &integers[2]) == 4)
{
printf("name is %s\n", name);
for (i = 0; i < 3; ++i)
{
printf("i[%d] = %d\n", i, integers[i]);
}
}
fclose(f);
You could, alternatively, use strtok(). If, for example, you use a struct for each entry as follows,
typedef struct {
char name[64];
int id, score, section_num;
} entry_t;
the following would read each line of the file as follows.
char line[128] = {'\0'};
char *field = NULL;
entry_t entry;
while (fgets(line, sizeof(line), fp)) {
field = strtok(line, ";");
if (!field || strlen(field) > sizeof(entry.name)) continue;
strcpy(entry.name, field);
field = strtok(NULL, ";");
if (!field) continue;
entry.id = atoi(field);
field = strtok(NULL, ";%");
if (!field) continue;
entry.score = atoi(field);
field = strtok(NULL, ";");
if (!field) continue;
entry.section_num = atoi(field);
// Do whatever you need with the entry - e.g. print its contents
}
I have removed some necessary boilerplate code for brevity. See http://codepad.org/lg6BJ0hk for a full example.
You can use strtol() instead of atoi() if you need to check the results of the integer conversions.
The following code will allow you to read data separated by ; from your file:
char msg[100];
int a;
char b[100];
int c;
fscanf(fptr, "%[^;];%d;%[^;];%d", msg, &a, b, &c);
printf("%s\n %d\n %d\n %d\n", msg, a, atoi(b), c);

how to read a text file in c and then split each line into tokens?

The input text file has some numbers per line, numbers are split by space. The first two lines only got one number, and the following lines got three. What I want to do is read each line of the input and store these numbers.
This is what I've got so far:
int
main(int argc, char *argv[]) {
int n = 0;
char buff[MAX_STRING_LEN]; //MAX_STRING_LEN is defined as 64
while (fgets(buff,MAX_STRING_LEN, stdin) != NULL) {
char temp;
if (n == 0) {
sscanf(buff, "%s", &temp);
int h_num = (int)temp;
} else if (n == 1) {
sscanf(buff, "%s", &temp);
int s_num = (int)temp;
} else {
sscanf(buff, "%s", &temp);
char *token;
token = strtok(&temp, " ");
int i = 0;
int a,b,c;
while (token != NULL) {
if (i == 0) {
a = (int)token;
token = strtok(NULL, " ");
} else if (i == 1) {
b = (int)token;
token = strtok(NULL, " ");
} else {
c = (int)token;
token = strtok(NULL, " ");
}
i++;
}
}
n++;
}
return 0;
}
The print statement I used to test my code is like:
printf("%d\n",h_num);
printf("%d\n%d\n%d\n",a,b,c);
I created a text file like this:
23
34
4 76 91
but the output is not what I expected, it's the address of the pointer I think. (I'm stuck with pointer again =( )
Could someone help me to point out what the problem is? Appreciate it.
In your code, I can see,
int h_num = (int)temp;
and
int s_num = (int)temp;
No, that is not how you convert an aphanumeric string to int.
You need to use strtol() for this purpose.
Then,
sscanf(buff, "%s", &temp);
is wrong. temp is a char, you got to use %c for that.
My suggestion for a better approach:
Read a complete line from file using fgets()
tokenize the input using strtok(), using space () as delimiter, then convert the token (if not NULL) to int using strtol()
continue untill the returned token is NULL
In this case, your code will be much more generic, as don't need to bother seperately about the number of ints present in each line.

Loop crashing in C

I'm very new to C and I'm still learning the basics. I'm creating an application that reads in a text file and breaks down the words individually. My intention will be to count the amount of times each word occurs.
Anyway, the last do-while loop in the code below executes fine, and then crashes. This loop prints memory address to this word (pointer) and then prints the word. It accomplishes this fine, and then crashes on the last iteration. My intention is to push this memory address into a singly linked list, albeit once it's stopped crashing.
Also, just a quick mention regarding the array sizes below; I yet figured out how to set the correct size needed to hold the word character array etc because you must define the size before the array is filled, and I don't know how to do this. Hence why I've set them to 1024.
#include<stdio.h>
#include<string.h>
int main (int argc, char **argv) {
FILE * pFile;
int c;
int n = 0;
char *wp;
char wordArray[1024];
char delims[] = " "; // delims spaces in the word array.
char *result = NULL;
result = strtok(wordArray, delims);
char holder[1024];
pFile=fopen (argv[1],"r");
if (pFile == NULL) perror ("Error opening file");
else {
do {
c = fgetc (pFile);
wordArray[n] = c;
n++;
} while (c != EOF);
n = 0;
fclose (pFile);
do {
result = strtok(NULL, delims);
holder[n] = *result; // holder stores the value of 'result', which should be a word.
wp = &holder[n]; // wp points to the address of 'holder' which holds the 'result'.
n++;
printf("Pointer value = %d\n", wp); // Prints the address of holder.
printf("Result is \"%s\"\n", result); // Prints the 'result' which is a word from the array.
//sl_push_front(&wp); // Push address onto stack.
} while (result != NULL);
}
return 0;
}
Please ignore the bad program structure, as I mentioned, I'm new to this!
Thanks
As others have pointed out, your second loop attempts to dereference result before you check for it being NULL. Restructure your code as follows:
result = strtok( wordArray, delims ); // do this *after* you have read data into
// wordArray
while( result != NULL )
{
holder[n] = *result;
...
result = strtok( NULL, delims );
}
Although...
You're attempting to read the entire contents of the file into memory before breaking it up into words; that's not going to work for files bigger than the size of your buffer (currently 1K). If I may make a suggestion, change your code such that you're reading individual words as you go. Here's an example that breaks the input stream up into words delimited by whitespace (blanks, newlines, tabs, etc.) and punctuation (period, comma, etc.):
#include <stdio.h>
#include <ctype.h>
int main(int argc, char **argv)
{
char buffer[1024];
int c;
size_t n = 0;
FILE *input = stdin;
if( argc > 1 )
{
input = fopen( argv[1], "r");
if (!input)
input = stdin;
}
while(( c = fgetc(input)) != EOF )
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
fclose(input);
return 0;
}
No warranties express or implied (having pounded this out before 7:00 a.m.). But it should give you a flavor of how to parse a file as you go. If nothing else, it avoids using strtok, which is not the greatest of tools for parsing input. You should be able to adapt this general structure to your code. For best results, you should abstract that out into its own function:
int getNextWord(FILE *stream, char *buf, size_t bufsize)
{
int c;
size_t n = 0;
while(( c = fgetc(input)) != EOF && n < bufsize)
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buf[n] = 0;
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
if (n == 0)
return 0;
else
return 1;
}
and you would call it like
void foo(void)
{
char word[SOME_SIZE];
...
while (getNextWord(inFile, word, sizeof word))
{
do_something_with(word);
}
...
}
If you expect in your do...while code, that result could be null (this is the condition for loop break), how do you think this code-line:
holder[n] = *result;
must work? It seems to me, that it is the reason for crashing in your program.
Change do while loop to while
use
while (condition)
{
}
instead of
do {
}while(condition)
It is crashing because you are trying to derefrance a NULL pointer result in do while loop.
I work mostly with Objective-C and was just looking at your question for fun, but I may have a solution.
Before setting n=0; after your first do-while loop, create another variable called totalWords and set it equal to n, totalWords can be declared anywhere within the file (except within one of the do-while loops), but can be defined at the top to the else block since its lifetime is short:
totalWords = n;
then you can set n back to zero:
n = 0;
Your conditional for the final do-while loop should then say:
...
} while (n <= ++totalWords);
The logic behind the application will thus say, count the words in the file (there are n words, which is the totalWords in the file). When program prints the results to the console, it will run the second do-while loop, which will run until n is one result past the value of totalWords (this ensures that you print the final word).
Alternately, it is better practice and clearer for other programmers to use a loop and a half:
do {
result = strtok(NULL, delims);
holder[n] = *result;
wp = &holder[n];
printf("Pointer value = %d\n", wp);
printf("Result is \"%s\"\n", result);
//sl_push_front(&wp); // Push address onto stack.
if (n == totalWords) break; // This forces the program to exit the do-while after we have printed the last word
n++; // We only need to increment if we have not reached the last word
// if our logic is bad, we will enter an infinite loop, which will tell us while testing that our logic is bad.
} while (true);

Resources