Reading line by line using fscanf - c

I want to read a file with 3 lines:
The first one with strings, second with a number and the third with strings again.
Example:
Line 1: bird toy book computer water
Line 2: 2
Line 3: toy water
I have this code, that reads a file, word by word storing them in the word array, and then putting the word into the words 2d array.
char words [5][50];
char word [50];
int i,j;
j = 0;
while( (fscanf(file, "%s", word))!=EOF ){
for(i = 0; i<50; i++){
if(word[i] != NULL){
words[j][i] = word[i];
} else{
break;
}
}
j++;
}
it's working, but it reads all the lines, i want a way to just do this process for the first line, and then store the second line into a int variable and the third line into another 2d array.

Read more about fscanf. It is not suitable to read line by line.
Consider instead reading every line with fgets or even better (on POSIX) with getline (see this), then parse each line perhaps with sscanf. Its return value (the count of scanned items given from sscanf etc...) could be useful to test (and you might also want to use %n in the scan control format string; as Jonathan Leffler commented, read also about %ms assignment-allocation modifier, at least on POSIX systems, see Linux sscanf(3)).
BTW, hard-coding limits like 50 for your word length is bad taste (and not robust). Consider perhaps using more systematically C dynamic memory allocation (using malloc, free and friends) and pointers, perhaps using sometimes flexible array members in some of your struct-s

Related

How to find out how many words are in each line?

Say you have a text file filled with sentences. For example:
hey how are you
you good?
nice to meet you jeff
I'm writing a program to print things out depending on how many indexes are on each line but I cant wrap my head around how to find how many words on each line. How could I go about counting how many words are on each line?
for (int i=0; i < wordle->leng; i++) {
printf ("%s ", wordle->allwords[i]);
This is my print function for the program. leng is how many lines so it knows how many times to repeat.
Some of the lines have 5 words, some 3, and it isn't printing in the correct format. Also not all lines will end with punctuation.
The POSIX getline() function is very useful for that; it reads line from stream until EOL. So you can read with that line by line and the you could make a loop that adds 1 to int word_count = 0; every time you read something that is not a whitespace and the previous char before that was whitespace (but you have to make additional logic for initial word).
You can use fgets() if you don't have getline() available, but it doesn't expand the buffer to deal with extra long lines, unlike getline().

Stack Smashing and using malloc

I'm making a program that counts the number of words contained within a file. My code works for certain test cases with files that have less than a certain amount of words/characters...But when I test it with, let's say, a word like:
"loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong", (this is not random--this is an actual test case I'm required to check), it gives me this error:
*** stack smashing detected ***: ./wcount.out terminated
Abort (core dumped)
I know what the error means and that I have to implement some sort of malloc line of code to be able to allocate the right amount of memory, but I can't figure out where in my function to put it or how to do it:
int NumberOfWords(char* argv[1]) {
FILE* inFile = NULL;
char temp_word[20]; <----------------------I think this is the problem
int num_words_in_file;
int words_read = 0;
inFile = fopen(argv[1], "r");
while (!feof(inFile)) {
fscanf(inFile, "%s", temp_word);
words_read++;
}
num_words_in_file = words_read;
printf("There are %d word(s).\n", num_words_in_file - 1);
fclose(inFile);
return num_words_in_file;
}
As you've correctly identified by rendering your source code invalid (future tip: /* put your arrows in comments */), the problem is that temp_word only has enough room for 20 characters (one of which must be a terminal null character).
In addition, you should check the return value of fopen. I'll leave that as an exercise for you. I've answered this question in other questions (such as this one), but I don't think just shoving code into your face will help you.
In this case, I think it may pay to better analyse the problem you have, to see if you actually need to store words to count them. As we define a word (the kind read by scanf("%s", ...) as a sequence of non-whitespace characters followed by a sequence of (zero or more) whitespace characters, we can see that such a counting program as yours needs to follow the following procedure:
Read as much whitespace as possible
Read as much non-whitespace as possible
Increment the "word" counter if all was successful
You don't need to store the non-whitespace any more than you do the whitespace, because once you've read it you'll never revisit it. Thus you could write this as two loops embedded into one: one loop which reads as much whitespace as possible, another which reads non-whitespace, followed by your incrementation and then the outer loop repeats the whole lot... until EOF is reached...
This will be best achieved using the %*s directive, which tells scanf-related functions not to try to store the word. For example:
size_t word_count = 0;
do {
fscanf(inFile, "%*s");
} while (!feof(inFile) && ++word_count);
You are limited by the size of your array. A simple solution would be to increase the size of your array. But you are always susceptible to stack smashing if someone enters a long word.
A word is delimited by spaces.
You can simply store a counter variable initialized to zero, and a variable that records the current char that you are looking at. Every time you read in a character using fgetc(inFile, &temp) that is a space, you increment the counter.
In your current code you simply want to count the words. Therefore you are not interested in the words themselves. You can suppress the assignment with the optional * character:
fscanf(inFile, "%*s");

C : Best way to go to a known line of a file

I have a file in which I'd like to iterate without processing in any sort the current line. What I am looking for is the best way to go to a determined line of a text file. For example, storing the current line into a variable seems useless until I get to the pre-determined line.
Example :
file.txt
foo
fooo
fo
here
Normally, in order to get here, I would have done something like :
FILE* file = fopen("file.txt", "r");
if (file == NULL)
perror("Error when opening file ");
char currentLine[100];
while(fgets(currentLine, 100, file))
{
if(strstr(currentLine, "here") != NULL)
return currentLine;
}
But fgetswill have to read fully three line uselessly and currentLine will have to store foo, fooo and fo.
Is there a better way to do this, knowing that here is line 4? Something like a go tobut for files?
Since you do not know the length of every line, no, you will have to go through the previous lines.
If you knew the length of every line, you could probably play with how many bytes to move the file pointer. You could do that with fseek().
You cannot access directly to a given line of a textual file (unless all lines have the same size in bytes; and with UTF8 everywhere a Unicode character can take a variable number of bytes, 1 to 6; and in most cases lines have various length - different from one line to the next). So you cannot use fseek (because you don't know in advance the file offset).
However (at least on Linux systems), lines are ending with \n (the newline character). So you could read byte by byte and count them:
int c= EOF;
int linecount=1;
while ((c=fgetc(file)) != EOF) {
if (c=='\n')
linecount++;
}
You then don't need to store the entire line.
So you could reach the line #45 this way (using while ((c=fgetc(file)) != EOF) && linecount<45) ...) and only then read entire lines with fgets or better yet getline(3) on POSIX systems (see this example). Notice that the implementation of fgets or of getline is likely to be built above fgetc, or at least share some code with it. Remember that <stdio.h> is buffered I/O, see setvbuf(3) and related functions.
Another way would be to read the file in two passes. A first pass stores the offset (using ftell(3)...) of every line start in some efficient data structure (a vector, an hashtable, a tree...). A second pass use that data structure to retrieve the offset (of the line start), then use fseek(3) (using that offset).
A third way, POSIX specific, would be to memory-map the file using mmap(2) into your virtual address space (this works well for not too huge files, e.g. of less than a few gigabytes). With care (you might need to mmap an extra ending page, to ensure the data is zero-byte terminated) you would then be able to use strchr(3) with '\n'
In some cases, you might consider parsing your textual file line by line (using appropriately fgets, or -on Linux- getline, or generating your parser with flex and bison) and storing each line in a relational database (such as PostGreSQL or sqlite).
PS. BTW, the notion of lines (and the end-of-line mark) vary from one OS to the next. On Linux the end-of-line is a \n character. On Windows lines are rumored to end with \r\n, etc...
A FILE * in C is a stream of chars. In a seekable file, you can address these chars using the file pointer with fseek(). But apart from that, there are no "special characters" in files, a newline is just another normal character.
So in short, no, you can't jump directly to a line of a text file, as long as you don't know the lengths of the lines in advance.
This model in C corresponds to the files provided by typical operating systems. If you think about it, to know the starting points of individual lines, your file system would have to store this information somewhere. This would mean treating text files specially.
What you can do however is just count the lines instead of pattern matching, something like this:
#include <stdio.h>
int main(void)
{
char linebuf[1024];
FILE *input = fopen("seekline.c", "r");
int lineno = 0;
char *line;
while (line = fgets(linebuf, 1024, input))
{
++lineno;
if (lineno == 4)
{
fputs("4: ", stdout);
fputs(line, stdout);
break;
}
}
fclose(input);
return 0;
}
If you don't know the length of each line, you have to go through all of them. But if you know the line you want to stop you can do this:
while (!found && fgets(line, sizeof line, file) != NULL) /* read a line */
{
if (count == lineNumber)
{
//you arrived at the line
//in case of a return first close the file with "fclose(file);"
found = true;
}
else
{
count++;
}
}
At least you can avoid so many calls to strstr

What happens with extra memory using fscanf?

I'm new to C and I have a couple questions about fscanf. I wrote a simple program that reads the contents of a file and spits it back out on the command line:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char* argv[1])
{
if (argc != 2)
{
printf("Usage: fscanf txt\n");
return 1;
}
char* txt = argv[1];
FILE* fp = fopen(txt, "r");
if (fp == NULL)
{
printf("Could not open %s.\n", txt);
return 2;
}
char s[50];
while (fscanf(fp, "%49s", s) == 1)
printf("%s\n", s);
return 0;
}
Let's say the contents of my text file is just "C is cool.", which will output:
C
is
cool.
So I have two questions here:
1) Does fscanf assume that the placeholder "%s" will be a single word (an array of chars only)? According to this program's output, spaces and line breaks seem to prompt the function to return. But what if I wanted to read a whole paragraph? Would I use fread() instead?
2) More importantly I'm wondering what happens with all of the unused space in the array. On the first iteration, I think s[0] = "C" and s[1] = "\0", so are s[2] - s[49] just wasted?
EDIT: while (fscanf(fp, "%**49**s", s) == 1) - thanks to #M Oehm for pointing this out - enforcing strong limit here to prevent dangerous buffer overflows
1) Does fscanf assume that the placeholder "%s" will be a single word
(an array of chars only)? According to this program's output, spaces
and line breaks seem to prompt the function to return. But what if I
wanted to read a whole paragraph? Would I use fread() instead?
The %s specifier reads single words that are delimited by white space. The scanf family of functions are very cerude; they do not normally distinguish between line breaks and spaces, for example.
A line is anything up to the next newline. There is no concept of paragraph, but you might consider anything between blank lines a paragraph. The function to read lines of text is fgets, so you could read lines until you find an empty one. (fgets retains the newline at the end, mind.)
fread is a function for reading binary data. It is not useful for reading structured texts. (But it can be used to read the contents of a whole text file at once.)
2) More importantly I'm wondering what happens with all of the unused
space in the array. On the first iteration, I think c[0] = 'C' and
c[1] = '\0', so are c[2] - c[49] just wasted?
You are right, the data after the null ternimator isn't used. "Wasted" is too negative – with user input you don't know whether you encounter a longer word eventually. Because dynamic allocation requires some care in C, allocating "enogh for most cases" is a goopd practice in C. You should enforce the hard limit when reading, though, to prevent buffer overruns:
fscanf(fp, "%49s", s)
The issue of "wasted" memory becomes more serious if you have an array of arrays of 50 chars. Most of the words will be much shorter than 50 chars. Here, the extra memory might eventually hurt you. 48 extra characters for reading a line are okay, though.
(A strategy to save "compact" arrays of chars is to have a running array of chars that is a concatenation of all strings, including their terminators. The word array is then an array of piointers into that master string.)
You use specifier %s which will read and store data in array s until it encounters a space or newline . As soon as it encounters space fscanf returns.
I think c[0] = "C" and c[1] = "\0", so are c[2] - c[49] just wasted?
Yes , s[0]='C' and s[1]='\0' and you probably can't do anything about the size of array being much more.
If you want complete string "C is cool" stored in array use fgets.
#define len 1000
char s[len];
while(fgets(s,len,fp)!=NULL) {
//your code
}

Parsing words in C; Translating program

I'm developing a program that will translate a string from the user (English) into Spanish.
For the assignment I'm given a file that contains a list of a 100 words and their spanish equivalent. I've successfully opened that file, and fed it to the string with a two dimensional array.
What I'm having difficulty with is parsing the words so it will allow me to find the equivalent version of the given words; any words that aren't given are suppose to be replaced with asterisks (*). Any ideas on how I can parse the words from the users inputted string?
Below is snippits of the source code to save some time.
--Thanks
char readFile[100][25];
fp = fopen("words.dat", "r");
if (fp == NULL){
printf ("File failed to load\n");
}
//This is how I stored the file into the two dimensional string.
while (fgets(readFile, 100, fp)){
x++;
}
printf ("User please input string\n");
gets (input);
That's as far as I've gotten. I commented out the for-loop that outputs the words so I can see the words (for the sake of curiousity) and it was successful. The format of the file string is
(english word), (spanish word).
First of, the array you declare is 100 arrays of 25-character arrays. If we talk about "lines" it means you have 100 lines where each line can be 24 characters (remember we need one extra for the terminating '\0' character). If you want 25 lines of 99 characters each, switch place of the sizes.
Secondly, you overwrite the same bytes of the array over and over again. And since each sub-array is actually only 25 characters, you can overwrite up to four of those arrays with that fgets call.
I suggest something like this instead:
size_t count = 0;
for (int i = 0; i < sizeof(readFile) / sizeof(readFile[0]) &&
fgets(readFile[i], sizeof(readFile[i]), fp); i++, count++)
{
}
This will make sure you don't read more than you can store, and automatically reads into the correct "line" in the array. After the loop count will contain the number of lines you read.

Resources